Message boards : ATLAS application : No more than 3 single core ATLAS will run.
Message board moderation

To post messages, you must log in.

AuthorMessage
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32684 - Posted: 8 Oct 2017, 4:05:32 UTC

Using the app_config from https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4264&postid=30260
and changing the <max_concurrent>8</...> for each project, choosing only ATLAS in server side preferences, setting server side to 24 core, unlimited tasks and setting <project_max_concurrent>24</...>,
no more than 3 single core ATLAS WU's will load into RAM.

BOINC is told to use as much as 98% of RAM (idle or in use) and 98% of Swap.

I can get BOINC to run 2 WU of ATLAS, Theory, LHCb and CMS into RAM at once and at most 2x ATLAS, 2x CMS, 2x LHCb and 6x Theory; BOINC will not load above 12 WU's.

It seems that others are running 12 ATLAS on 128GB RAM machines so 8x 3400GB = 27,200 GB should easily be accomplished.
This machine ran 32x NFS large WU's at once topping used RAM out at over 30GB.

Any ideas what could be wrong?
ID: 32684 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 871
Credit: 6,518,748
RAC: 11,166
Message 32687 - Posted: 8 Oct 2017, 5:06:01 UTC - in response to Message 32684.  
Last modified: 8 Oct 2017, 5:11:48 UTC

Any ideas what could be wrong?

one thing that comes into my mind: what are your settings in BOINC under
Options > Computing preferences > Computing > Other

Store at least ... days of work
Store up to an additional ... days of work

I ran into this trap some time ago, when I temporarily lowered these values for another project and forgot to set them back to the original values after returning to LHC.
ID: 32687 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32688 - Posted: 8 Oct 2017, 5:35:17 UTC - in response to Message 32687.  

Any ideas what could be wrong?

one thing that comes into my mind: what are your settings in BOINC under
Options > Computing preferences > Computing > Other

Store at least ... days of work
Store up to an additional ... days of work

I ran into this trap some time ago, when I temporarily lowered these values for another project and forgot to set them back to the original values after returning to LHC.


0.5/1.5 days

There were plenty of WU's in the cache. Over 60 by glance; not actual count.
ID: 32688 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 466
Credit: 141,241,227
RAC: 213,426
Message 32697 - Posted: 8 Oct 2017, 16:21:23 UTC

If you set atlas server side to 24 then it will always get 24, the boinc work cache is ignored.

I'm running more than 3 on my computers.

I can't run a mixture of ATLAS and the other project, otherwise the cap of 24 WU kicks in.

By default I think ATLAS reports to BOINC that it uses 5.5GB so it could be that BOINC thinks it can't run more WU's. so BOINC thinks it's using 66GB
ID: 32697 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32702 - Posted: 9 Oct 2017, 4:37:50 UTC - in response to Message 32697.  
Last modified: 9 Oct 2017, 4:53:14 UTC

If you set atlas server side to 24 then it will always get 24, the boinc work cache is ignored.

I'm running more than 3 on my computers.

I can't run a mixture of ATLAS and the other project, otherwise the cap of 24 WU kicks in.

By default I think ATLAS reports to BOINC that it uses 5.5GB so it could be that BOINC thinks it can't run more WU's. so BOINC thinks it's using 66GB



These are 1 core WU's at 3400MB each.

My hypothesis was that BOINC is going by reported numbers, not actual.
My first guess was that, even though these are 1 core, they are being considered as 8 core (x3) to get the 24 core limit and only 3 ATLAS would run. Since you are getting more than 3, then the next guess is a default RAM reporting of some number much higher than 3400MB.
If BOINC uses the default data of the 8 core ATLAS, those have 9800MB, for a total of 3x9800 which would limit this 32GB RAM machine to 3. If it's using your number of 5.5GB, then it should be able to get 5 or 6 WU's into RAM and it won't. BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB computer.... baffling.

Looks like using single core WU's might not be much of an advantage to machines with more base RAM if the actual RAM usge isn't reported to the BOINC client.
It's looking like the optimal solution for my machines will fall somewhere at 6x 4 core WU's mix among all projects but I need to run the tests to confirm. The drives are hitting transfer limits on 12 WU's at once and can't save them all in 60 seconds on suspend or exit and lots of thrashing on startup.


Project developer:
So is BOINC going by reported default of 8 core/9800MB to calculate number of allowed ATLAS WU's?
ID: 32702 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 871
Credit: 6,518,748
RAC: 11,166
Message 32705 - Posted: 9 Oct 2017, 4:57:30 UTC - in response to Message 32702.  

BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB Computer ...

From my own experience, and also from what other crunchers were saying, it seems fairly impossible to run ATLAS (1-core) on a 4GB RAM machine.
ID: 32705 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 466
Credit: 141,241,227
RAC: 213,426
Message 32710 - Posted: 9 Oct 2017, 6:47:19 UTC

David talked about some optimisations, it needs lot of ram on startup then less for the actual tasks so they could put a big swap file in. However I agree that 4GB is low.

When I ran 1core ones I could run 24 on a machine with 128GB with 5GB/WU. I'm not sure what you see for the BOINC caclulated ram you can see this from the task properties WorkingSetSize, this is not what you set in the app_config, so if it's say 9.8GB then boinc will not run more tasks as it thinks it's run out of memory, you can tweek this by useing the # of cores setting on the web. 3 cores = 5.18GB. If you set the web setting to 1core then it should bring down what BOINC thinks is the ram usage.
ID: 32710 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32814 - Posted: 13 Oct 2017, 0:26:41 UTC - in response to Message 32705.  
Last modified: 13 Oct 2017, 0:37:55 UTC

BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB Computer ...

From my own experience, and also from what other crunchers were saying, it seems fairly impossible to run ATLAS (1-core) on a 4GB RAM machine.


Your OS is taking up too much RAM.

Use Tiny7 and limit services to bare minimum or get one of those tiny Linux versions like Puppy Linux.

My Dell M6500 with 4GB RAM ran WU's before the project merged with LHC@Home.
ID: 32814 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32815 - Posted: 13 Oct 2017, 0:34:17 UTC - in response to Message 32710.  

David talked about some optimisations, it needs lot of ram on startup then less for the actual tasks so they could put a big swap file in. However I agree that 4GB is low.

When I ran 1core ones I could run 24 on a machine with 128GB with 5GB/WU. I'm not sure what you see for the BOINC caclulated ram you can see this from the task properties WorkingSetSize, this is not what you set in the app_config, so if it's say 9.8GB then boinc will not run more tasks as it thinks it's run out of memory, you can tweek this by useing the # of cores setting on the web. 3 cores = 5.18GB. If you set the web setting to 1core then it should bring down what BOINC thinks is the ram usage.



The setting reads "Max # of CPUs for this project" and so the documentation reads as a method of limiting work load to maximum 24 cores on all machines, thus excluding super computers. Wouldn't be the first time a project used an unusual method to limit outbound work. SZTAKI has an unusual work load balancing mechanism.

If it had read "Max # of CPUs per job" then I would have understand it to be the number of cores per work unit downloaded.

It would be appreciated it they worded that parameter to fit appropriately what it's delimiting.

Anyway, I've adjusted the Work Preferences section to 1 core and will try to see how many WU will download for each project on that machine tomorrow.
ID: 32815 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 871
Credit: 6,518,748
RAC: 11,166
Message 32818 - Posted: 13 Oct 2017, 5:03:51 UTC - in response to Message 32815.  

The setting reads "Max # of CPUs for this project"
yes, this wording is definitely misleading.

In fact, the meaning of it is:
"Max # of CPUs per job" then I would have understand it to be the number of cores per work unit downloaded.
ID: 32818 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 119
Credit: 5,250,392
RAC: 0
Message 32905 - Posted: 26 Oct 2017, 4:40:54 UTC - in response to Message 32818.  

The setting reads "Max # of CPUs for this project"
yes, this wording is definitely misleading.

In fact, the meaning of it is:
"Max # of CPUs per job" then I would have understand it to be the number of cores per work unit downloaded.


Thankyou for acknowledging my frustration. :)


Can only get a single WU on both machines today.

Posting a new thread about that.
ID: 32905 · Report as offensive     Reply Quote

Message boards : ATLAS application : No more than 3 single core ATLAS will run.


©2018 CERN