Message boards : Theory Application : Issues Native Theory application
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38260 - Posted: 18 Mar 2019, 12:17:57 UTC
Last modified: 18 Mar 2019, 13:30:26 UTC

Native Theory Application Setup (Linux only)

Please post here if there are any issues.
ID: 38260 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 38261 - Posted: 18 Mar 2019, 13:20:38 UTC - in response to Message 38259.  
Last modified: 18 Mar 2019, 13:21:27 UTC

Very well. I was just thinking that Native ATLAS works so well for me that anyone having troubles with VirtualBox should give it up and run only that.

But when I did the "sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local" I got a bunch of error messages that I won't bore you with, and then the probe failed almost entirely.

Not to worry. I rebooted and tried again, and this time everything went swimmingly well. I have downloaded my first Theory tasks and will see how they fly.

Thanks.
ID: 38261 · Report as offensive     Reply Quote
schelle

Send message
Joined: 12 Sep 08
Posts: 6
Credit: 37,799,196
RAC: 71
Message 38265 - Posted: 18 Mar 2019, 20:05:59 UTC

Hello connect to CVMFS is ok! got the following error:

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
20:36:48 (96380): wrapper (7.15.26016): starting
20:36:48 (96380): wrapper (7.15.26016): starting
20:36:48 (96380): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 ()
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS.
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking runc.
19:37:23 2019-03-18: cranky-0.0.28: [INFO] Creating the filesystem.
19:37:23 2019-03-18: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
19:37:24 2019-03-18: cranky-0.0.28: [INFO] Updating config.json.
19:37:25 2019-03-18: cranky-0.0.28: [INFO] Running Container 'runc'.
19:38:00 2019-03-18: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1.
20:38:01 (96380): cranky exited; CPU time 0.196865
20:38:01 (96380): app exit status: 0xce
20:38:01 (96380): called boinc_finish(195)

</stderr_txt>
]]>

Thanks
Schelle
ID: 38265 · Report as offensive     Reply Quote
schelle

Send message
Joined: 12 Sep 08
Posts: 6
Credit: 37,799,196
RAC: 71
Message 38266 - Posted: 18 Mar 2019, 20:28:37 UTC - in response to Message 38265.  

"runc" was missing...
now it is installed ;)
ID: 38266 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 38267 - Posted: 18 Mar 2019, 20:47:59 UTC - in response to Message 38266.  

"runc" was missing...
now it is installed ;)
runc is provided via CVMFS, so there should be no need for installing runc and this should not fix your problem. If it does, please report here.
Could you please post the output of the commands (hopefully they work on Scientific Linux)

cat /proc/sys/kernel/unprivileged_userns_clone
and
 cat /proc/sys/user/max_user_namespaces
ID: 38267 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 38268 - Posted: 18 Mar 2019, 20:54:35 UTC

No problems with the first five. Runs times vary considerably from about 6 minutes to 1 hour 27 minutes.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10588304&offset=0&show_names=0&state=4&appid=17

It runs on two CPU cores per work unit. I don't know if that is the default, or because that is what I have set for Native ATLAS (in an app_config.xml).
But on the my preferences page, I have it set to
Max # jobs No limit
Max # CPUs 8

At any rate, that is what I want, so it works for me.
ID: 38268 · Report as offensive     Reply Quote
schelle

Send message
Joined: 12 Sep 08
Posts: 6
Credit: 37,799,196
RAC: 71
Message 38271 - Posted: 18 Mar 2019, 21:33:00 UTC - in response to Message 38267.  

hello

here the output:
cat /proc/sys/user/max_user_namespaces
100


cat /proc/sys/kernel/unprivileged_userns_clone
cat: /proc/sys/kernel/unprivileged_userns_clone: Datei oder Verzeichnis nicht gefunden


thanks
Schelle
ID: 38271 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38274 - Posted: 19 Mar 2019, 4:52:45 UTC
Last modified: 19 Mar 2019, 4:55:22 UTC

I did the setup for native theory on this host which was already running native ATLAS on Ubuntu 18.10. The setup went without error, thanks Ivan for the great directions.
Now I have 2 X 2-core native theory tasks running for ~10 minutes. In top I see for user boinc:
2 X agile-runmc, each at ~75% CPU
2 X rivetvm.exe, 1 at ~75% CPU, 1 at ~55% CPU

Update:
After ~30 minutes I see in top:
2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU
2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU
Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU?
ID: 38274 · Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 17 Sep 04
Posts: 19
Credit: 308,023
RAC: 0
Message 38275 - Posted: 19 Mar 2019, 5:48:52 UTC
Last modified: 19 Mar 2019, 6:01:01 UTC

Well, tried the new app also on my Linux VB, but they do error out almost immediately, logs all look like this:

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
21:53:07 (2676): wrapper (7.15.26016): starting
21:53:07 (2676): wrapper (7.15.26016): starting
21:53:07 (2676): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 ()
20:53:07 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App
20:53:07 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS.
20:53:07 2019-03-18: cranky-0.0.28: [ERROR] 'which' could not locate the command 'cvmfs_config'.
21:53:08 (2676): cranky exited; CPU time 0.004000
21:53:08 (2676): app exit status: 0xce
21:53:08 (2676): called boinc_finish(195)

</stderr_txt>
]]>

Don't know if this is an app-problem or from my host, so I leave it be.
Hopefully a Windows app will come out also.

Edit:
Oh, I see now in the other thread that I have to install CVFMS myself. Will do that and then check it again, ignore this here meanwhile. ;-)
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 38275 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38278 - Posted: 19 Mar 2019, 7:09:51 UTC - in response to Message 38274.  
Last modified: 19 Mar 2019, 7:55:57 UTC

The setup went without error, thanks Ivan for the great directions.
The directions are actually by Laurence ;)

2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU
2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU
Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU?

To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc.
So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's.
ID: 38278 · Report as offensive     Reply Quote
schelle

Send message
Joined: 12 Sep 08
Posts: 6
Credit: 37,799,196
RAC: 71
Message 38279 - Posted: 19 Mar 2019, 7:36:09 UTC - in response to Message 38271.  
Last modified: 19 Mar 2019, 7:37:55 UTC

Hello connect to CVMFS is ok! got the following error:

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
20:36:48 (96380): wrapper (7.15.26016): starting
20:36:48 (96380): wrapper (7.15.26016): starting
20:36:48 (96380): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 ()
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Detected TheoryN App
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking CVMFS.
19:36:49 2019-03-18: cranky-0.0.28: [INFO] Checking runc.
19:37:23 2019-03-18: cranky-0.0.28: [INFO] Creating the filesystem.
19:37:23 2019-03-18: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
19:37:24 2019-03-18: cranky-0.0.28: [INFO] Updating config.json.
19:37:25 2019-03-18: cranky-0.0.28: [INFO] Running Container 'runc'.
19:38:00 2019-03-18: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1.
20:38:01 (96380): cranky exited; CPU time 0.196865
20:38:01 (96380): app exit status: 0xce
20:38:01 (96380): called boinc_finish(195)

</stderr_txt>
]]>

Thanks
Schelle



Solved: enable user namespaces in kernel (CentOS 7)
ID: 38279 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 55
Credit: 10,223,976
RAC: 2,477
Message 38284 - Posted: 19 Mar 2019, 10:44:50 UTC - in response to Message 38278.  

The setup went without error, thanks Ivan for the great directions.
The directions are actually by Laurence ;)

2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU
2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU
Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU?

To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc.
So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's.


The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.
ID: 38284 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38285 - Posted: 19 Mar 2019, 10:49:45 UTC - in response to Message 38278.  

The setup went without error, thanks Ivan for the great directions.
The directions are actually by Laurence ;)

2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU
2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU
Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU?

To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc.
So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's.

But BOINC manager shows 2 X 2-CPU tasks = 4 CPU's in use, in other words no idle CPU's. Also, the task run times are nearly equal to the task CPU times when I would expect CPU time to be a little less than double the run time.
ID: 38285 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38286 - Posted: 19 Mar 2019, 10:56:23 UTC - in response to Message 38284.  
Last modified: 19 Mar 2019, 10:59:53 UTC

The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.
Don't trust the values reported in the results, specially when they are equal.
Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914
It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043,
but when you calculate job finish time minus the job start time (the job should have ran in one flow)
06:18:02 (32596): cranky exited; CPU time 3061.446043
05:38:19 (32596): wrapper (7.15.26016): starting

you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used.
ID: 38286 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,903,747
RAC: 137,963
Message 38287 - Posted: 19 Mar 2019, 12:08:07 UTC - in response to Message 38286.  

You find the correct runtimes and CPU times in the scheduler_request when the task is reported.
For some reason the server does not trust the reported runtime and sets runtime=CPU time if CPU time is (much?) higher than runtime.
May be a plausibility check for singlecore tasks or something like that.

This happens especially on systems that run below 100% load.
CP already explained the "cycle stealing".

If you run an overbooked system or limit the CPU usage by cgroups and kernel CPUShares/CPUQuotas this would result in higher runtimes and then the server trusts the reported values.
ID: 38287 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38288 - Posted: 19 Mar 2019, 12:14:04 UTC - in response to Message 38286.  

The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.
Don't trust the values reported in the results, specially when they are equal.
Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914
It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043,
but when you calculate job finish time minus the job start time (the job should have ran in one flow)
06:18:02 (32596): cranky exited; CPU time 3061.446043
05:38:19 (32596): wrapper (7.15.26016): starting

you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used.

OK it makes sense now with respect to the numbers adding up correctly. I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.
ID: 38288 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38291 - Posted: 19 Mar 2019, 12:40:32 UTC - in response to Message 38288.  

... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.
Promise: sherpa's will come. I've one running at the moment ;)
ID: 38291 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 7,914,481
RAC: 27,114
Message 38296 - Posted: 19 Mar 2019, 13:56:16 UTC - in response to Message 38266.  
Last modified: 19 Mar 2019, 14:18:01 UTC

"runc" was missing...
now it is installed ;)

I have the same problem.

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
06:41:18 (3651): wrapper (7.15.26016): starting
06:41:18 (3651): wrapper (7.15.26016): starting
06:41:18 (3651): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.28 ()
13:41:18 2019-03-19: cranky-0.0.28: [INFO] Detected TheoryN App
13:41:18 2019-03-19: cranky-0.0.28: [INFO] Checking CVMFS.
13:41:19 2019-03-19: cranky-0.0.28: [INFO] Checking runc.
13:41:19 2019-03-19: cranky-0.0.28: [ERROR] /cvmfs/grid.cern.ch/vc/containers/runc does not exist.
06:41:19 (3651): cranky exited; CPU time 0.023174
06:41:19 (3651): app exit status: 0xce
06:41:19 (3651): called boinc_finish(195)

</stderr_txt>
]]>

Not sure if installing runs fixed it yet, as there is no work available. Also, since I had 8 errors testing this, it looks like I am throttled to only 1 task per day. So I won't be able to see if it works until tomorrow.
LHC@home 3/19/2019 7:11:34 AM This computer has finished a daily quota of 1 tasks


Here in the info requested:

$ cat /proc/sys/kernel/unprivileged_userns_clone
1

$ cat /proc/sys/user/max_user_namespaces
23590
ID: 38296 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38297 - Posted: 19 Mar 2019, 13:56:51 UTC - in response to Message 38291.  

... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.
Promise: sherpa's will come. I've one running at the moment ;)

Damn! Guess I have to modify my watchdog script again.
ID: 38297 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,903,747
RAC: 137,963
Message 38299 - Posted: 19 Mar 2019, 14:30:00 UTC

@all volunteers having problems with Theory native.

Some errors are caused by an installed but misconfigured CVMFS.

Before you request any task you may check if "cvmfs_config probe" returns OK like the following example that has configured all repositories required for ATLAS and Theory.

cvmfs_config probe
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
Probing /cvmfs/cernvm-prod.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/alice.cern.ch... OK


If any of those is missing, include it in /etc/cvmfs/default.local.
If any of those fails, post a message here.
ID: 38299 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Theory Application : Issues Native Theory application


©2024 CERN