Message boards : Theory Application : New Native Theory Version 1.1
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
m

Send message
Joined: 6 Sep 08
Posts: 116
Credit: 11,097,403
RAC: 5,330
Message 39491 - Posted: 3 Aug 2019, 19:14:20 UTC

Not sure if I'm making progress or not...

found and installed libseccomp 2.4.1 (the regular way from a lxc PPA)

that produced this

18:17:01 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
container_linux.go:336: starting container process caused "process_linux.go:293:
applying cgroup configuration for process caused \"mountpoint for devices not found\""
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:17:02 (3846): cranky exited; CPU time 0.005645
18:17:02 (3846): app exit status: 0xce
18:17:02 (3846): called boinc_finish(195)
18:17:03 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

found this
https://stackoverflow.com/questions/22555264/docker-hello-world-not-working/22555932#22555932

and this https://github.com/opencontainers/runc/issues/798

so inatalled cgroups-lite 1.11

with this result

18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
18:34:48 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] ===> [runRivet] Sat Aug 3
17:34:48 UTC 2019 [boinc pp mb-inelastic 7000 - - phojet 1.12a default 100000 84]
18:34:53 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:34:54 (4451): cranky exited; CPU time 0.044983
18:34:54 (4451): app exit status: 0xce
18:34:54 (4451): called boinc_finish(195)

all subsequent tasks end like this

18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
standard_init_linux.go:203: exec user process caused "too many levels of symbolic links"
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:45:44 (4733): cranky exited; CPU time 0.014341
18:45:44 (4733): app exit status: 0xce
18:45:44 (4733): called boinc_finish(195)
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

I haven't explicitly added any symlinks.

Now I'm stuck.
ID: 39491 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 39514 - Posted: 7 Aug 2019, 19:44:17 UTC

The third run will need a lot of time to finishing this task:
boinc pp jets 13000 250,-,4160 - sherpa 2.2.0 default 2000 72]
First Computer canceled after 4 days, the second after 26 days:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117408514
ID: 39514 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 39536 - Posted: 9 Aug 2019, 6:09:13 UTC

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=120079734
05:14:23 CEST +02:00 2019-08-09: cranky-0.0.29: [INFO] ===> [runRivet] Fri Aug 9 03:14:22 UTC 2019 [boinc pp jets 8000 25 - pythia6 6.423 pro-q2o 100000 86]
06:07:08 CEST +02:00 2019-08-09: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
Both are running in a Status Code 1.
ID: 39536 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 39656 - Posted: 19 Aug 2019, 21:38:10 UTC - in response to Message 39536.  

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=122039899
[boinc pp jets 8000 25 - pythia8 8.180 default 100000 90]
16:07:04 UTC +00:00 2019-08-19: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=121978893
ID: 39656 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 39749 - Posted: 29 Aug 2019, 19:15:47 UTC

Hi everyone!

Theory Native is single core only, right?

Thanks!!
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 39749 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,490,244
RAC: 1,961
Message 39750 - Posted: 29 Aug 2019, 19:31:47 UTC - in response to Message 39749.  

Theory Native is single core only, right?

Yes it is, but when you have idle time left it will use some extra cpu time to speed up the job.

Example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=244323423
ID: 39750 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39751 - Posted: 29 Aug 2019, 19:33:08 UTC - in response to Message 39749.  
Last modified: 29 Aug 2019, 19:34:07 UTC

Theory Native is single core only, right?

It used to use two cores, but seems to be one now.
ID: 39751 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,350,409
RAC: 132,500
Message 39752 - Posted: 29 Aug 2019, 19:37:55 UTC - in response to Message 39749.  
Last modified: 29 Aug 2019, 19:41:26 UTC

Theory Native is single core only, right?

From the perspective of your BOINC client: YES

But!
A typical Theory native pstree looks like this:
cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe
                     │                   ├─runRivet.sh───sleep
                     │                   ├─rungen.sh───pythia8.exe
                     │                   └─sleep
                     └─8*[{runc}]

In this example the 2 main processes are rivetvm.exe and pythia8.exe.
Their total CPU share is usually greater than 1.
ID: 39752 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39753 - Posted: 29 Aug 2019, 19:47:55 UTC - in response to Message 39752.  
Last modified: 29 Aug 2019, 19:49:36 UTC

In this example the 2 main processes are rivetvm.exe and pythia8.exe.
Their total CPU share is usually greater than 1.

I have noticed that on native ATLAS even more so. You can set whatever you want in an app_config., but it will use all your cores somehow.
It can make it difficult to run with other projects by the way. I think it is best to devote a machine to it.
ID: 39753 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,350,409
RAC: 132,500
Message 39754 - Posted: 29 Aug 2019, 20:00:52 UTC - in response to Message 39753.  

IIRC ATLAS native requires the nthreads option to be set, e.g.
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--nthreads 2</cmdline>

Vbox apps do not need "nthreads" if <avg_ncpus> is already set.
ID: 39754 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39755 - Posted: 29 Aug 2019, 21:07:05 UTC - in response to Message 39754.  
Last modified: 29 Aug 2019, 21:08:32 UTC

IIRC ATLAS native requires the nthreads option to be set, e.g.
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--nthreads 2</cmdline>

Vbox apps do not need "nthreads" if <avg_ncpus> is already set.

The last time I did native ATLAS before putting in an app_config, it just used all the cores available.
That was eight cores (as shown by BOINC), which maybe is all that ATLAS will use anyway?
ID: 39755 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 167
Credit: 14,945,019
RAC: 511
Message 39762 - Posted: 30 Aug 2019, 8:56:45 UTC - in response to Message 39755.  

The last time I did native ATLAS before putting in an app_config, it just used all the cores available.

But did you actually apply any limit?
I've never seen Atlas-native disobey the "Max # CPUs" set through the website (which I believe translates into said --nthreads, for those of us that don't mess with XML), and yes I've tried at least values of 1, 2, 4 and "No limit"...

IIRC the Atlas limit is 12 cores, but that's from a while back. Even at 8 cores there's noticeable inefficiency from the single-threaded start-up/shutdown phases.
ID: 39762 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39769 - Posted: 30 Aug 2019, 23:41:19 UTC - in response to Message 39762.  
Last modified: 30 Aug 2019, 23:43:10 UTC

I've never seen Atlas-native disobey the "Max # CPUs" set through the website (which I believe translates into said --nthreads, for those of us that don't mess with XML), and yes I've tried at least values of 1, 2, 4 and "No limit"....

BOINC will show however many cores that you set in the app_config. But run a "top" command, and you will see that all the cores are being used regardless of what you have set (at least up to 8, the maximum I tried).

(I normally just set the "Max # CPUs" to either 8 or unlimited, so I don't know how a lesser value would affect it; I use the app_config to limit it further.)
ID: 39769 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 167
Credit: 14,945,019
RAC: 511
Message 39786 - Posted: 1 Sep 2019, 17:42:04 UTC - in response to Message 39769.  

But run a "top" command, and you will see that all the cores are being used regardless of what you have set (at least up to 8, the maximum I tried).

I think I misunderstood what you were trying to say (and then expressed myself badly).

Long ago, in Sixtrack-only days, "Max # CPUs" meant roughly "Max. total cores to be used for BOINCing, leaving the rest free for the user's day job." VBox and Atlas-native tasks instead interpret it to mean "Cores/task", so you're right that that setting can't be used to control overall load on the machine any more.

I presently still have "Max # CPUs" set to 4, to get the 8-core machine to refill with Atlas efficiently after running Sixtracks; top gives

 7587 boinc    39  19 2618m 1.8g  11m R 99.8 11.4 263:33.42 athena.py
 7589 boinc    39  19 2602m 1.8g  11m R 99.5 11.5 263:42.48 athena.py
32249 boinc    39  19 2597m 1.8g  44m R 99.1 11.3   1:15.13 athena.py
 7586 boinc    39  19 2601m 1.8g  12m R 98.8 11.4 263:40.89 athena.py
 7588 boinc    39  19 2602m 1.8g  11m R 97.8 11.4 263:19.11 athena.py
32250 boinc    39  19 2597m 1.8g  44m R 96.5 11.3   1:15.65 athena.py
32248 boinc    39  19 2597m 1.8g  44m R 95.2 11.3   1:16.12 athena.py
32251 boinc    39  19 2597m 1.8g  44m R 95.2 11.3   1:14.79 athena.py

i.e. there are two 4-core tasks, one just started and one nearly 5 hours in. (A couple of weeks back there'd have been one 4-core task and some Sixtracks).

So, indeed I've never seen Atlas-native (or indeed VBox) disobey the "Max # CPUs" set through the website, with the proviso that "Max # CPUs" there actually means "Cores/task", rather than what it used to/should do.

The next step would be to then limit the number of tasks running at any one time, which is probably best done through an app_config; I can't remember if I've ever tried setting "Max # jobs" to just one, and anyway Atlas-native has, er, idiosyncratic ideas about how many tasks it queues at the client for strange beancounting purposes.

(I read your
The last time I did native ATLAS before putting in an app_config, it just used all the cores available.
as implying that an Atlas-native task would grab any extra unused cores it sees at runtime irrespective of --nthreads, however that's been set, which I don't believe is true.)
ID: 39786 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,639,906
RAC: 16,049
Message 39787 - Posted: 1 Sep 2019, 18:00:01 UTC - in response to Message 39786.  
Last modified: 1 Sep 2019, 18:01:04 UTC

So, indeed I've never seen Atlas-native (or indeed VBox) disobey the "Max # CPUs" set through the website, with the proviso that "Max # CPUs" there actually means "Cores/task", rather than what it used to/should do.

This "Max # CPUs" is in the project settings. The one you are probably confusing it with is still in the BOINC settings and is called "Use at most XX % of the CPUs".
ID: 39787 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,350,409
RAC: 132,500
Message 39788 - Posted: 1 Sep 2019, 18:27:51 UTC - in response to Message 39786.  

Could the ATLAS discussion please continued in the ATLAS thread?


Long ago, in Sixtrack-only days, "Max # CPUs" meant roughly "Max. total cores to be used for BOINCing, leaving the rest free for the user's day job."

The option "Max # CPUs" was never thought to limit the total cores for BOINC since BOINC has an option for this that can be used in app_config.xml
Instead "Max # CPUs" has been introduced to set exactly what it is used for today.
Unfortunately ATLAS uses "Max # CPUs" to also limit the #tasks that can be downloaded - a request from (unknown) accountants as David Cameron explained a long while ago.
This results in client buffers that are only partly filled, especially if CPUs with lots of cores are used.
ID: 39788 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39789 - Posted: 1 Sep 2019, 23:09:47 UTC - in response to Message 39786.  

(I read your
The last time I did native ATLAS before putting in an app_config, it just used all the cores available.
as implying that an Atlas-native task would grab any extra unused cores it sees at runtime irrespective of --nthreads, however that's been set, which I don't believe is true.)

I have not used -nthreads for a while, and don't recall how it behaves, so you are probably correct.
I just use <avg_ncpus>, which uses all the (virtual) cores; i.e., threads on native ATLAS. I don't think it does that on native Theory, but I am not running it at the moment.
ID: 39789 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 39807 - Posted: 2 Sep 2019, 20:27:29 UTC

A couple of months ago I got fed up with a series of Blue-screen-of-death loops on an old 2-core Athlon not being happy with a Windows update so I completely reformatted it and put on Linux Mint instead.
After a few failed attempts, I have got it successfully running Theory_Native. I'm not entirely convinced that it's doing exactly as it should as, during setup, it didn't seem to like cvmfs_config and autofs, although "probe" returned OKs, and some finished tasks look too quick to have done much (unless this Native is waaay faster than on VBox/Windows). However it is returning McPlots so it must be working OK.
It's running an ordinary VBox Theory where "Show VM Console" and "Show Graphics" let me see various outputs.
My Native task doesn't have those buttons although I can see it is currently running a Herwig through "top" in a terminal. Can I get to those remote-desktop live outputs (events processed, etc) without being too Linuxy? (I'm not a convert yet. I still prefer point-and-click to Terminal.)
ID: 39807 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,350,409
RAC: 132,500
Message 39808 - Posted: 2 Sep 2019, 21:29:46 UTC - in response to Message 39807.  

Theory native doesn't have a point-and-click interface to your BOINC client.
To monitor the progress of your running task like in console 2 of a vbox task you may do the following:

Open a console window (either direct at the linux host or remote from another computer).
In this console window run the command:
tail -Fn100 /path_to_your_boinc_client/slots/x/cernvm/shared/runRivet.log

x must be replaced by the slotnumber of your running task.
ID: 39808 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 39811 - Posted: 3 Sep 2019, 12:27:38 UTC - in response to Message 39808.  

OK, thanks,
I thought it might be fiddly 😵
I don't think I'll be setting up the suspend/resume stuff either. Well, not until I have the time to get it wrong a few times.
ID: 39811 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Theory Application : New Native Theory Version 1.1


©2024 CERN