Thread 'Issues Native Theory application'

Author	Message
Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,985,934 RAC: 993	Message 38260 - Posted: 18 Mar 2019, 12:17:57 UTC Last modified: 18 Mar 2019, 13:30:26 UTC Native Theory Application Setup (Linux only) Please post here if there are any issues. ID: 38260 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 38261 - Posted: 18 Mar 2019, 13:20:38 UTC - in response to Message 38259. Last modified: 18 Mar 2019, 13:21:27 UTC Very well. I was just thinking that Native ATLAS works so well for me that anyone having troubles with VirtualBox should give it up and run only that. But when I did the "sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local" I got a bunch of error messages that I won't bore you with, and then the probe failed almost entirely. Not to worry. I rebooted and tried again, and this time everything went swimmingly well. I have downloaded my first Theory tasks and will see how they fly. Thanks. ID: 38261 · Reply Quote

schelle Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0	Message 38265 - Posted: 18 Mar 2019, 20:05:59 UTC Hello connect to CVMFS is ok! got the following error: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS. 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking runc. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Creating the filesystem. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 19:37:24 2019-03-18: cranky-0.0.28: [INFO] Updating config.json. 19:37:25 2019-03-18: cranky-0.0.28: [INFO] Running Container 'runc'. 19:38:00 2019-03-18: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1. 20:38:01 (96380): cranky exited; CPU time 0.196865 20:38:01 (96380): app exit status: 0xce 20:38:01 (96380): called boinc_finish(195) </stderr_txt> ]]> Thanks Schelle ID: 38265 · Reply Quote

schelle Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0	Message 38266 - Posted: 18 Mar 2019, 20:28:37 UTC - in response to Message 38265. "runc" was missing... now it is installed ;) ID: 38266 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,659,192 RAC: 246	Message 38267 - Posted: 18 Mar 2019, 20:47:59 UTC - in response to Message 38266. "runc" was missing... now it is installed ;) runc is provided via CVMFS, so there should be no need for installing runc and this should not fix your problem. If it does, please report here. Could you please post the output of the commands (hopefully they work on Scientific Linux) cat /proc/sys/kernel/unprivileged_userns_clone and cat /proc/sys/user/max_user_namespaces ID: 38267 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 38268 - Posted: 18 Mar 2019, 20:54:35 UTC No problems with the first five. Runs times vary considerably from about 6 minutes to 1 hour 27 minutes. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10588304&offset=0&show_names=0&state=4&appid=17 It runs on two CPU cores per work unit. I don't know if that is the default, or because that is what I have set for Native ATLAS (in an app_config.xml). But on the my preferences page, I have it set to Max # jobs No limit Max # CPUs 8 At any rate, that is what I want, so it works for me. ID: 38268 · Reply Quote

schelle Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0	Message 38271 - Posted: 18 Mar 2019, 21:33:00 UTC - in response to Message 38267. hello here the output: cat /proc/sys/user/max_user_namespaces 100 cat /proc/sys/kernel/unprivileged_userns_clone cat: /proc/sys/kernel/unprivileged_userns_clone: Datei oder Verzeichnis nicht gefunden thanks Schelle ID: 38271 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38274 - Posted: 19 Mar 2019, 4:52:45 UTC Last modified: 19 Mar 2019, 4:55:22 UTC I did the setup for native theory on this host which was already running native ATLAS on Ubuntu 18.10. The setup went without error, thanks Ivan for the great directions. Now I have 2 X 2-core native theory tasks running for ~10 minutes. In top I see for user boinc: 2 X agile-runmc, each at ~75% CPU 2 X rivetvm.exe, 1 at ~75% CPU, 1 at ~55% CPU Update: After ~30 minutes I see in top: 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU 2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU? ID: 38274 · Reply Quote

DoctorNow Send message Joined: 17 Sep 04 Posts: 19 Credit: 396,156 RAC: 0	Message 38275 - Posted: 19 Mar 2019, 5:48:52 UTC Last modified: 19 Mar 2019, 6:01:01 UTC Well, tried the new app also on my Linux VB, but they do error out almost immediately, logs all look like this: <core_client_version>7.6.31</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 21:53:07 (2676): wrapper (7.15.26016): starting 21:53:07 (2676): wrapper (7.15.26016): starting 21:53:07 (2676): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 20:53:07 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App 20:53:07 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS. 20:53:07 2019-03-18: cranky-0.0.28: [ERROR] 'which' could not locate the command 'cvmfs_config'. 21:53:08 (2676): cranky exited; CPU time 0.004000 21:53:08 (2676): app exit status: 0xce 21:53:08 (2676): called boinc_finish(195) </stderr_txt> ]]> Don't know if this is an app-problem or from my host, so I leave it be. Hopefully a Windows app will come out also. Edit: Oh, I see now in the other thread that I have to install CVFMS myself. Will do that and then check it again, ignore this here meanwhile. ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats ID: 38275 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,985,934 RAC: 993	Message 38278 - Posted: 19 Mar 2019, 7:09:51 UTC - in response to Message 38274. Last modified: 19 Mar 2019, 7:55:57 UTC The setup went without error, thanks Ivan for the great directions. The directions are actually by Laurence ;) 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU 2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU? To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc. So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's. ID: 38278 · Reply Quote

schelle Send message Joined: 12 Sep 08 Posts: 6 Credit: 37,799,196 RAC: 0	Message 38279 - Posted: 19 Mar 2019, 7:36:09 UTC - in response to Message 38271. Last modified: 19 Mar 2019, 7:37:55 UTC Hello connect to CVMFS is ok! got the following error: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper (7.15.26016): starting 20:36:48 (96380): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS. 19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking runc. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Creating the filesystem. 19:37:23 2019-03-18: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 19:37:24 2019-03-18: cranky-0.0.28: [INFO] Updating config.json. 19:37:25 2019-03-18: cranky-0.0.28: [INFO] Running Container 'runc'. 19:38:00 2019-03-18: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1. 20:38:01 (96380): cranky exited; CPU time 0.196865 20:38:01 (96380): app exit status: 0xce 20:38:01 (96380): called boinc_finish(195) </stderr_txt> ]]> Thanks Schelle Solved: enable user namespaces in kernel (CentOS 7) ID: 38279 · Reply Quote

mmonnin Send message Joined: 22 Mar 17 Posts: 82 Credit: 29,783,045 RAC: 11	Message 38284 - Posted: 19 Mar 2019, 10:44:50 UTC - in response to Message 38278. The setup went without error, thanks Ivan for the great directions. The directions are actually by Laurence ;) 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU 2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU? To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc. So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's. The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time. ID: 38284 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38285 - Posted: 19 Mar 2019, 10:49:45 UTC - in response to Message 38278. The setup went without error, thanks Ivan for the great directions. The directions are actually by Laurence ;) 2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU 2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU? To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc. So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's. But BOINC manager shows 2 X 2-CPU tasks = 4 CPU's in use, in other words no idle CPU's. Also, the task run times are nearly equal to the task CPU times when I would expect CPU time to be a little less than double the run time. ID: 38285 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,985,934 RAC: 993	Message 38286 - Posted: 19 Mar 2019, 10:56:23 UTC - in response to Message 38284. Last modified: 19 Mar 2019, 10:59:53 UTC The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time. Don't trust the values reported in the results, specially when they are equal. Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914 It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043, but when you calculate job finish time minus the job start time (the job should have ran in one flow) 06:18:02 (32596): cranky exited; CPU time 3061.446043 05:38:19 (32596): wrapper (7.15.26016): starting you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used. ID: 38286 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 71,016	Message 38287 - Posted: 19 Mar 2019, 12:08:07 UTC - in response to Message 38286. You find the correct runtimes and CPU times in the scheduler_request when the task is reported. For some reason the server does not trust the reported runtime and sets runtime=CPU time if CPU time is (much?) higher than runtime. May be a plausibility check for singlecore tasks or something like that. This happens especially on systems that run below 100% load. CP already explained the "cycle stealing". If you run an overbooked system or limit the CPU usage by cgroups and kernel CPUShares/CPUQuotas this would result in higher runtimes and then the server trusts the reported values. ID: 38287 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38288 - Posted: 19 Mar 2019, 12:14:04 UTC - in response to Message 38286. The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time. Don't trust the values reported in the results, specially when they are equal. Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914 It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043, but when you calculate job finish time minus the job start time (the job should have ran in one flow) 06:18:02 (32596): cranky exited; CPU time 3061.446043 05:38:19 (32596): wrapper (7.15.26016): starting you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used. OK it makes sense now with respect to the numbers adding up correctly. I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs. ID: 38288 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,985,934 RAC: 993	Message 38291 - Posted: 19 Mar 2019, 12:40:32 UTC - in response to Message 38288. ... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs. Promise: sherpa's will come. I've one running at the moment ;) ID: 38291 · Reply Quote

zombie67 [MM] Send message Joined: 24 Nov 06 Posts: 76 Credit: 10,211,769 RAC: 0	Message 38296 - Posted: 19 Mar 2019, 13:56:16 UTC - in response to Message 38266. Last modified: 19 Mar 2019, 14:18:01 UTC "runc" was missing... now it is installed ;) I have the same problem. <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 06:41:18 (3651): wrapper (7.15.26016): starting 06:41:18 (3651): wrapper (7.15.26016): starting 06:41:18 (3651): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 () 13:41:18 2019-03-19: cranky-0.0.28: [INFO] Detected TheoryN App 13:41:18 2019-03-19: cranky-0.0.28: [INFO] Checking CVMFS. 13:41:19 2019-03-19: cranky-0.0.28: [INFO] Checking runc. 13:41:19 2019-03-19: cranky-0.0.28: [ERROR] /cvmfs/grid.cern.ch/vc/containers/runc does not exist. 06:41:19 (3651): cranky exited; CPU time 0.023174 06:41:19 (3651): app exit status: 0xce 06:41:19 (3651): called boinc_finish(195) </stderr_txt> ]]> Not sure if installing runs fixed it yet, as there is no work available. Also, since I had 8 errors testing this, it looks like I am throttled to only 1 task per day. So I won't be able to see if it works until tomorrow. LHC@home 3/19/2019 7:11:34 AM This computer has finished a daily quota of 1 tasks Here in the info requested: $ cat /proc/sys/kernel/unprivileged_userns_clone 1 $ cat /proc/sys/user/max_user_namespaces 23590 ID: 38296 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38297 - Posted: 19 Mar 2019, 13:56:51 UTC - in response to Message 38291. ... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs. Promise: sherpa's will come. I've one running at the moment ;) Damn! Guess I have to modify my watchdog script again. ID: 38297 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 71,016	Message 38299 - Posted: 19 Mar 2019, 14:30:00 UTC @all volunteers having problems with Theory native. Some errors are caused by an installed but misconfigured CVMFS. Before you request any task you may check if "cvmfs_config probe" returns OK like the following example that has configured all repositories required for ATLAS and Theory. cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK Probing /cvmfs/cernvm-prod.cern.ch... OK Probing /cvmfs/sft.cern.ch... OK Probing /cvmfs/alice.cern.ch... OK If any of those is missing, include it in /etc/cvmfs/default.local. If any of those fails, post a message here. ID: 38299 · Reply Quote