Message boards :
Theory Application :
Issues Native Theory application
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,541,076 RAC: 5,106 |
|
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Very well. I was just thinking that Native ATLAS works so well for me that anyone having troubles with VirtualBox should give it up and run only that. But when I did the "sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local" I got a bunch of error messages that I won't bore you with, and then the probe failed almost entirely. Not to worry. I rebooted and tried again, and this time everything went swimmingly well. I have downloaded my first Theory tasks and will see how they fly. Thanks. |
Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0 |
Hello connect to CVMFS is ok! got the following error: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS. 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking runc. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Creating the filesystem. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 19:37:24 2019-03-18: cranky-0.0.28: [INFO] Updating config.json. 19:37:25 2019-03-18: cranky-0.0.28: [INFO] Running Container 'runc'. 19:38:00 2019-03-18: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1. 20:38:01 (96380): cranky exited; CPU time 0.196865 20:38:01 (96380): app exit status: 0xce 20:38:01 (96380): called boinc_finish(195) </stderr_txt> ]]> Thanks Schelle |
Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0 |
"runc" was missing... now it is installed ;) |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
"runc" was missing...runc is provided via CVMFS, so there should be no need for installing runc and this should not fix your problem. If it does, please report here. Could you please post the output of the commands (hopefully they work on Scientific Linux) cat /proc/sys/kernel/unprivileged_userns_cloneand cat /proc/sys/user/max_user_namespaces |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
No problems with the first five. Runs times vary considerably from about 6 minutes to 1 hour 27 minutes. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10588304&offset=0&show_names=0&state=4&appid=17 It runs on two CPU cores per work unit. I don't know if that is the default, or because that is what I have set for Native ATLAS (in an app_config.xml). But on the my preferences page, I have it set to Max # jobs No limit Max # CPUs 8 At any rate, that is what I want, so it works for me. |
Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0 |
hello here the output: cat /proc/sys/user/max_user_namespaces 100 cat /proc/sys/kernel/unprivileged_userns_clone cat: /proc/sys/kernel/unprivileged_userns_clone: Datei oder Verzeichnis nicht gefunden thanks Schelle |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I did the setup for native theory on this host which was already running native ATLAS on Ubuntu 18.10. The setup went without error, thanks Ivan for the great directions. Now I have 2 X 2-core native theory tasks running for ~10 minutes. In top I see for user boinc: 2 X agile-runmc, each at ~75% CPU 2 X rivetvm.exe, 1 at ~75% CPU, 1 at ~55% CPU Update: After ~30 minutes I see in top: 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU 2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU? |
Send message Joined: 17 Sep 04 Posts: 19 Credit: 308,023 RAC: 0 |
Well, tried the new app also on my Linux VB, but they do error out almost immediately, logs all look like this: <core_client_version>7.6.31</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 21:53:07 (2676): wrapper (7.15.26016): starting 21:53:07 (2676): wrapper (7.15.26016): starting 21:53:07 (2676): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 20:53:07 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App 20:53:07 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS. 20:53:07 2019-03-18: cranky-0.0.28: [ERROR] 'which' could not locate the command 'cvmfs_config'. 21:53:08 (2676): cranky exited; CPU time 0.004000 21:53:08 (2676): app exit status: 0xce 21:53:08 (2676): called boinc_finish(195) </stderr_txt> ]]> Don't know if this is an app-problem or from my host, so I leave it be. Hopefully a Windows app will come out also. Edit: Oh, I see now in the other thread that I have to install CVFMS myself. Will do that and then check it again, ignore this here meanwhile. ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,541,076 RAC: 5,106 |
The setup went without error, thanks Ivan for the great directions.The directions are actually by Laurence ;) 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc. So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's. |
Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0 |
Hello connect to CVMFS is ok! got the following error: Solved: enable user namespaces in kernel (CentOS 7) |
Send message Joined: 22 Mar 17 Posts: 66 Credit: 14,582,079 RAC: 486 |
The setup went without error, thanks Ivan for the great directions.The directions are actually by Laurence ;) The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
The setup went without error, thanks Ivan for the great directions.The directions are actually by Laurence ;) But BOINC manager shows 2 X 2-CPU tasks = 4 CPU's in use, in other words no idle CPU's. Also, the task run times are nearly equal to the task CPU times when I would expect CPU time to be a little less than double the run time. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,541,076 RAC: 5,106 |
The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.Don't trust the values reported in the results, specially when they are equal. Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914 It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043, but when you calculate job finish time minus the job start time (the job should have ran in one flow) 06:18:02 (32596): cranky exited; CPU time 3061.446043 05:38:19 (32596): wrapper (7.15.26016): starting you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,472,638 RAC: 67,871 |
You find the correct runtimes and CPU times in the scheduler_request when the task is reported. For some reason the server does not trust the reported runtime and sets runtime=CPU time if CPU time is (much?) higher than runtime. May be a plausibility check for singlecore tasks or something like that. This happens especially on systems that run below 100% load. CP already explained the "cycle stealing". If you run an overbooked system or limit the CPU usage by cgroups and kernel CPUShares/CPUQuotas this would result in higher runtimes and then the server trusts the reported values. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.Don't trust the values reported in the results, specially when they are equal. OK it makes sense now with respect to the numbers adding up correctly. I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,541,076 RAC: 5,106 |
... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.Promise: sherpa's will come. I've one running at the moment ;) |
Send message Joined: 24 Nov 06 Posts: 76 Credit: 7,953,478 RAC: 0 |
"runc" was missing... I have the same problem. <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 06:41:18 (3651): wrapper (7.15.26016): starting 06:41:18 (3651): wrapper (7.15.26016): starting 06:41:18 (3651): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 13:41:18 2019-03-19: cranky-0.0.28: [INFO] Detected TheoryN App 13:41:18 2019-03-19: cranky-0.0.28: [INFO] Checking CVMFS. 13:41:19 2019-03-19: cranky-0.0.28: [INFO] Checking runc. 13:41:19 2019-03-19: cranky-0.0.28: [ERROR] /cvmfs/grid.cern.ch/vc/containers/runc does not exist. 06:41:19 (3651): cranky exited; CPU time 0.023174 06:41:19 (3651): app exit status: 0xce 06:41:19 (3651): called boinc_finish(195) </stderr_txt> ]]> Not sure if installing runs fixed it yet, as there is no work available. Also, since I had 8 errors testing this, it looks like I am throttled to only 1 task per day. So I won't be able to see if it works until tomorrow. LHC@home 3/19/2019 7:11:34 AM This computer has finished a daily quota of 1 tasks Here in the info requested: $ cat /proc/sys/kernel/unprivileged_userns_clone 1 $ cat /proc/sys/user/max_user_namespaces 23590 |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.Promise: sherpa's will come. I've one running at the moment ;) Damn! Guess I have to modify my watchdog script again. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,472,638 RAC: 67,871 |
@all volunteers having problems with Theory native. Some errors are caused by an installed but misconfigured CVMFS. Before you request any task you may check if "cvmfs_config probe" returns OK like the following example that has configured all repositories required for ATLAS and Theory. cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK Probing /cvmfs/cernvm-prod.cern.ch... OK Probing /cvmfs/sft.cern.ch... OK Probing /cvmfs/alice.cern.ch... OK If any of those is missing, include it in /etc/cvmfs/default.local. If any of those fails, post a message here. |
©2025 CERN