Message boards :
ATLAS application :
No more than 3 single core ATLAS will run.
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
Using the app_config from https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4264&postid=30260 and changing the <max_concurrent>8</...> for each project, choosing only ATLAS in server side preferences, setting server side to 24 core, unlimited tasks and setting <project_max_concurrent>24</...>, no more than 3 single core ATLAS WU's will load into RAM. BOINC is told to use as much as 98% of RAM (idle or in use) and 98% of Swap. I can get BOINC to run 2 WU of ATLAS, Theory, LHCb and CMS into RAM at once and at most 2x ATLAS, 2x CMS, 2x LHCb and 6x Theory; BOINC will not load above 12 WU's. It seems that others are running 12 ATLAS on 128GB RAM machines so 8x 3400GB = 27,200 GB should easily be accomplished. This machine ran 32x NFS large WU's at once topping used RAM out at over 30GB. Any ideas what could be wrong? |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,241,401 RAC: 172,589 ![]() ![]() ![]() |
Any ideas what could be wrong? one thing that comes into my mind: what are your settings in BOINC under Options > Computing preferences > Computing > Other Store at least ... days of work Store up to an additional ... days of work I ran into this trap some time ago, when I temporarily lowered these values for another project and forgot to set them back to the original values after returning to LHC. |
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
Any ideas what could be wrong? 0.5/1.5 days There were plenty of WU's in the cache. Over 60 by glance; not actual count. |
Send message Joined: 27 Sep 08 Posts: 752 Credit: 572,340,517 RAC: 154,798 ![]() ![]() ![]() |
If you set atlas server side to 24 then it will always get 24, the boinc work cache is ignored. I'm running more than 3 on my computers. I can't run a mixture of ATLAS and the other project, otherwise the cap of 24 WU kicks in. By default I think ATLAS reports to BOINC that it uses 5.5GB so it could be that BOINC thinks it can't run more WU's. so BOINC thinks it's using 66GB |
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
If you set atlas server side to 24 then it will always get 24, the boinc work cache is ignored. These are 1 core WU's at 3400MB each. My hypothesis was that BOINC is going by reported numbers, not actual. My first guess was that, even though these are 1 core, they are being considered as 8 core (x3) to get the 24 core limit and only 3 ATLAS would run. Since you are getting more than 3, then the next guess is a default RAM reporting of some number much higher than 3400MB. If BOINC uses the default data of the 8 core ATLAS, those have 9800MB, for a total of 3x9800 which would limit this 32GB RAM machine to 3. If it's using your number of 5.5GB, then it should be able to get 5 or 6 WU's into RAM and it won't. BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB computer.... baffling. Looks like using single core WU's might not be much of an advantage to machines with more base RAM if the actual RAM usge isn't reported to the BOINC client. It's looking like the optimal solution for my machines will fall somewhere at 6x 4 core WU's mix among all projects but I need to run the tests to confirm. The drives are hitting transfer limits on 12 WU's at once and can't save them all in 60 seconds on suspend or exit and lots of thrashing on startup. Project developer: So is BOINC going by reported default of 8 core/9800MB to calculate number of allowed ATLAS WU's? |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,241,401 RAC: 172,589 ![]() ![]() ![]() |
BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB Computer ... From my own experience, and also from what other crunchers were saying, it seems fairly impossible to run ATLAS (1-core) on a 4GB RAM machine. |
Send message Joined: 27 Sep 08 Posts: 752 Credit: 572,340,517 RAC: 154,798 ![]() ![]() ![]() |
David talked about some optimisations, it needs lot of ram on startup then less for the actual tasks so they could put a big swap file in. However I agree that 4GB is low. When I ran 1core ones I could run 24 on a machine with 128GB with 5GB/WU. I'm not sure what you see for the BOINC caclulated ram you can see this from the task properties WorkingSetSize, this is not what you set in the app_config, so if it's say 9.8GB then boinc will not run more tasks as it thinks it's run out of memory, you can tweek this by useing the # of cores setting on the web. 3 cores = 5.18GB. If you set the web setting to 1core then it should bring down what BOINC thinks is the ram usage. |
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
BUT, some people have reported they got a single ATLAS 1c/3400MB to run on a 4GB Computer ... Your OS is taking up too much RAM. Use Tiny7 and limit services to bare minimum or get one of those tiny Linux versions like Puppy Linux. My Dell M6500 with 4GB RAM ran WU's before the project merged with LHC@Home. |
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
David talked about some optimisations, it needs lot of ram on startup then less for the actual tasks so they could put a big swap file in. However I agree that 4GB is low. The setting reads "Max # of CPUs for this project" and so the documentation reads as a method of limiting work load to maximum 24 cores on all machines, thus excluding super computers. Wouldn't be the first time a project used an unusual method to limit outbound work. SZTAKI has an unusual work load balancing mechanism. If it had read "Max # of CPUs per job" then I would have understand it to be the number of cores per work unit downloaded. It would be appreciated it they worded that parameter to fit appropriately what it's delimiting. Anyway, I've adjusted the Work Preferences section to 1 core and will try to see how many WU will download for each project on that machine tomorrow. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,241,401 RAC: 172,589 ![]() ![]() ![]() |
The setting reads "Max # of CPUs for this project"yes, this wording is definitely misleading. In fact, the meaning of it is: "Max # of CPUs per job" then I would have understand it to be the number of cores per work unit downloaded. |
![]() Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 ![]() ![]() |
The setting reads "Max # of CPUs for this project"yes, this wording is definitely misleading. Thankyou for acknowledging my frustration. :) Can only get a single WU on both machines today. Posting a new thread about that. |
©2023 CERN