BOINC downloads only 2 ATLAS tasks

Author	Message
Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 169 Credit: 15,000,737 RAC: 0	Message 39053 - Posted: 5 Jun 2019, 16:42:22 UTC - in response to Message 35258. ... which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download. The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. ... For a while recently I had some of my machines attached to both Sixtrack and (multicore) Atlas: they decided that they preferred Sixtrack and so the Atlas jobs sat there for days at a time. i.e. even though there were 4 Atlas jobs queued locally, the number of cores running (or available for running) Atlas was ... zero. So I'm not sure this is a good basis for accurate accounting! ID: 39053 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 852 Credit: 694,188,665 RAC: 110,726	Message 39055 - Posted: 5 Jun 2019, 18:12:57 UTC - in response to Message 39041. When you view the properties of a task it has the working set size in the dialogue, e.g. I see 9.96GB as I have unlimited in my web settings. I'm not sure for native, in windows you can't really see the actual ram used by VM as it's in the kernel. ID: 39055 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 39070 - Posted: 6 Jun 2019, 17:47:42 UTC Last modified: 6 Jun 2019, 18:05:27 UTC Native ATLAS is the only CPU project I'm downloading now. GPUs are running Asteroids which BOINC credits with 0.01 CPUs, i.e. none. Since my CPUs are all divisible by four I should be able to run lots of 4C WUs. 24 = 4 x 6 28 = 4 x 7 32 = 4 x 8 36 = 4 x 9 40 = 4 x 10 44 = 4 x 11 But since the server is misconfigured I cannot. For ATLAS it takes the Preference parameter Max#CPUs as both the number of CPUs per WU and also the maximum number of WUs to allow a computer to have at one time. Even for a 44t CPU ATLAS will only run 4 x 4C WUs. Such a waste. CERN should be concerned with maximizing the duty cycle not driving it down below 36%. Assuming 3 hours per 4C WU I should be able to return 408 ATLAS WUs a day, assuming I had just one computer of each CPU type. If I add it up for 28 Linux computers it's 1768 WUs per day. 24: 6 x 8 = 48 28: 7 x 8 = 56 32: 8 x 8 = 64 36: 9 x 8 = 72 40: 10 x 8 = 80 44: 11 x 8 = 88 "The memory allocated to the virtual machine is calculated based on the number of cores following the formula: 3GB + 0.9GBncores." https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178 For a computer running only four 4C ATLAS WUs the memory requirement is: 4 x 6.6 GB = 26.4 GB* and actual use is: aurum@Rig-06:~$ sudo inxi -m Memory: RAM: total: 31.31 GiB used: 14.59 GiB (46.6%) Array-1: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s Array-2: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s For a computer running only five 8C ATLAS WUs the memory requirement is: 5 x 10.2 GB = 51 GB and actual use is: aurum@Rig-28:~$ sudo inxi -m Memory: RAM: total: 31.31 GiB used: 21.42 GiB (68.4%) Array-1: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s Array-2: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s There is an additional delay caused by the UL-DL pregnant pause. If I'm running four 4C WUs then the ATLAS server will not give me another one even if my work queue is set to 10 days/10 days. So when a WU finishes it waits until it has fully uploaded the 111 MB results file before downloading a single replacement 369 MB WU. With an approximate 120000 task backlog one would think CERN would want to make the most efficient use of donor resources. ID: 39070 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,241,911 RAC: 55,498	Message 39071 - Posted: 6 Jun 2019, 18:42:15 UTC - in response to Message 39070. Yes, the way how ATLAS interprets the max#cores is nonsense. Lots of volunteers claimed that it is nonsense but nothing has changed for more than a year. So why do you think this will change now? OK, it's nothing but a rhetorical question. Due to this situation you have 2 options to run ATLAS: Option 1: Let your BOINC client run "as is" and be happy with a 4-core setup running 4 tasks concurrently (=max. 16 cores). Option 2: Try to find a setup that allows your hosts to run ATLAS on more than 16 cores. Least effort would be the method to set a higher max#cores to get more tasks downloaded. Then adjust the running #cores with an app_config.xml. This would require a well formatted app_config.xml which has already be suggested. Other methods have also been suggested. Some of them several times. BTW: 3GB + 0.9GB*ncores is the ATLAS RAM formula used by the vbox app. ATLAS native needs much less RAM as you can see in your monitoring tools. BTW2: It's a matter of appreciation to spend a few seconds and mark links as URLs like this https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178 ID: 39071 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 39088 - Posted: 9 Jun 2019, 17:47:01 UTC I will not run Oracle's CatBox but when I saw CERN was coming out with native aps I came back. Yes I naively thought maybe CERN would be interested in the perspective of public donors but then I saw this page: https://lhcathome.cern.ch/lhcathome/apps.php With 63 PetaFLOPs they clearly have in-house computers that render John Q Public as nothing more than the buzz of an errant fly in their ear. There's another option that makes more efficient use of ones CPUs but involves too much babysitting. Somewhere I reported what I observed when I tried the app_config approach to specify --nthreads and it wastes many threads since BOINC counts differently. ID: 39088 · Reply Quote

LHC@home