Message boards : ATLAS application : BOINC downloads only 2 ATLAS tasks
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 9
Message 39053 - Posted: 5 Jun 2019, 16:42:22 UTC - in response to Message 35258.  

... which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download.

The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. ...


For a while recently I had some of my machines attached to both Sixtrack and (multicore) Atlas: they decided that they preferred Sixtrack and so the Atlas jobs sat there for days at a time. i.e. even though there were 4 Atlas jobs queued locally, the number of cores running (or available for running) Atlas was ... zero.

So I'm not sure this is a good basis for accurate accounting!
ID: 39053 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 692,006,372
RAC: 113,743
Message 39055 - Posted: 5 Jun 2019, 18:12:57 UTC - in response to Message 39041.  

When you view the properties of a task it has the working set size in the dialogue, e.g. I see 9.96GB as I have unlimited in my web settings.


I'm not sure for native, in windows you can't really see the actual ram used by VM as it's in the kernel.
ID: 39055 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 39070 - Posted: 6 Jun 2019, 17:47:42 UTC
Last modified: 6 Jun 2019, 18:05:27 UTC

Native ATLAS is the only CPU project I'm downloading now. GPUs are running Asteroids which BOINC credits with 0.01 CPUs, i.e. none.

Since my CPUs are all divisible by four I should be able to run lots of 4C WUs.
24 = 4 x 6
28 = 4 x 7
32 = 4 x 8
36 = 4 x 9
40 = 4 x 10
44 = 4 x 11
But since the server is misconfigured I cannot. For ATLAS it takes the Preference parameter Max#CPUs as both the number of CPUs per WU and also the maximum number of WUs to allow a computer to have at one time. Even for a 44t CPU ATLAS will only run 4 x 4C WUs. Such a waste.

CERN should be concerned with maximizing the duty cycle not driving it down below 36%.

Assuming 3 hours per 4C WU I should be able to return 408 ATLAS WUs a day, assuming I had just one computer of each CPU type. If I add it up for 28 Linux computers it's 1768 WUs per day.
24: 6 x 8 = 48
28: 7 x 8 = 56
32: 8 x 8 = 64
36: 9 x 8 = 72
40: 10 x 8 = 80
44: 11 x 8 = 88

"The memory allocated to the virtual machine is calculated based on the number of cores following the formula: 3GB + 0.9GB*ncores."
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178

For a computer running only four 4C ATLAS WUs the memory requirement is:
4 x 6.6 GB = 26.4 GB
and actual use is:
aurum@Rig-06:~$ sudo inxi -m
Memory:
RAM: total: 31.31 GiB used: 14.59 GiB (46.6%)
Array-1: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s
Array-2: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s

For a computer running only five 8C ATLAS WUs the memory requirement is:
5 x 10.2 GB = 51 GB
and actual use is:
aurum@Rig-28:~$ sudo inxi -m
Memory:
RAM: total: 31.31 GiB used: 21.42 GiB (68.4%)
Array-1: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s
Array-2: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s

There is an additional delay caused by the UL-DL pregnant pause. If I'm running four 4C WUs then the ATLAS server will not give me another one even if my work queue is set to 10 days/10 days. So when a WU finishes it waits until it has fully uploaded the 111 MB results file before downloading a single replacement 369 MB WU.

With an approximate 120000 task backlog one would think CERN would want to make the most efficient use of donor resources.
ID: 39070 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,130,739
RAC: 54,031
Message 39071 - Posted: 6 Jun 2019, 18:42:15 UTC - in response to Message 39070.  

Yes, the way how ATLAS interprets the max#cores is nonsense.
Lots of volunteers claimed that it is nonsense but nothing has changed for more than a year.
So why do you think this will change now?
OK, it's nothing but a rhetorical question.


Due to this situation you have 2 options to run ATLAS:

Option 1:
Let your BOINC client run "as is" and be happy with a 4-core setup running 4 tasks concurrently (=max. 16 cores).


Option 2:
Try to find a setup that allows your hosts to run ATLAS on more than 16 cores.


Least effort would be the method to set a higher max#cores to get more tasks downloaded.
Then adjust the running #cores with an app_config.xml.
This would require a well formatted app_config.xml which has already be suggested.


Other methods have also been suggested.
Some of them several times.


BTW:
3GB + 0.9GB*ncores is the ATLAS RAM formula used by the vbox app.
ATLAS native needs much less RAM as you can see in your monitoring tools.


BTW2:
It's a matter of appreciation to spend a few seconds and mark links as URLs like this
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178
ID: 39071 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 39088 - Posted: 9 Jun 2019, 17:47:01 UTC

I will not run Oracle's CatBox but when I saw CERN was coming out with native aps I came back. Yes I naively thought maybe CERN would be interested in the perspective of public donors but then I saw this page:
https://lhcathome.cern.ch/lhcathome/apps.php
With 63 PetaFLOPs they clearly have in-house computers that render John Q Public as nothing more than the buzz of an errant fly in their ear.

There's another option that makes more efficient use of ones CPUs but involves too much babysitting.

Somewhere I reported what I observed when I tried the app_config approach to specify --nthreads and it wastes many threads since BOINC counts differently.
ID: 39088 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : ATLAS application : BOINC downloads only 2 ATLAS tasks


©2024 CERN