Atlas task slowing right down near the end but still using all cores

Author	Message
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46370 - Posted: 28 Feb 2022, 11:29:20 UTC This task https://lhcathome.cern.ch/lhcathome/result.php?resultid=345643716 is running on an admittedly slow 4 core CPU under Windows, but this machine can do Atlas ok. This one is showing a slower and slower % complete, and has moved from 99.990 percent 12 hours ago to 99.999% now. Task manger shows all 4 cores are still being used. Should I let it run? Is it doing anything useful? ID: 46370 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,851,536 RAC: 3,668	Message 46371 - Posted: 28 Feb 2022, 11:42:36 UTC - in response to Message 46370. Don't look at BOINC's progress. Use the VM's Console with Alt-F3 and Alt-F2 to see the processes/cpu usage and events progress (total events 200 to do) ID: 46371 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46373 - Posted: 28 Feb 2022, 12:03:55 UTC Last modified: 28 Feb 2022, 12:07:38 UTC Thanks, it's showing it did 111 of 200 so far with a reasonable range of times (7 mins to 80 mins, average 29 minutes). I'll look again in some hours and see if it's done any more. 4 of athena.py are running, getting 98% ish CPU (cores presumably) each. Edit - moved to 112. All is fine, just an unusually long task. ID: 46373 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46374 - Posted: 28 Feb 2022, 12:07:50 UTC - in response to Message 46370. Just to mention it The computer running the task (https://lhcathome.cern.ch/lhcathome/result.php?resultid=345643716) reports an Intel Pentium N3700 CPU: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10772270 https://ark.intel.com/content/www/us/en/ark/products/87261/intel-pentium-processor-n3700-2m-cache-up-to-2-40-ghz.html The data sheet points out it is a 4C/4T CPU. Regarding multicore setups there's a clear VirtualBox recommendation here: https://forums.virtualbox.org/viewtopic.php?f=35&t=77413 To make it short: On this computer ATLAS VMs should not be configured to use more than 3 CPUs. At any time this advice can be ignored on your own responsibility. ID: 46374 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46375 - Posted: 28 Feb 2022, 12:14:34 UTC Last modified: 28 Feb 2022, 12:14:50 UTC I have 7 different machines, the 6 others are more powerful than that one, and having varying cores and HT. Are you recommending I lower the number of cores for Atlas on them all? Why doesn't Boinc / your server allocate something more sensible? Should I use the other cores for single core non-VB Boinc projects or leave them idle? What harm do I do by using all the cores? Will it slow Atlas down or just the host? On my main 24 thread machine, I only run one 8-thread Atlas limited in app_config, the other 6 machines are only Boinc and I don't care if the Windows system is sluggish. But I do care if Atlas is not running efficiently. ID: 46375 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46376 - Posted: 28 Feb 2022, 12:31:11 UTC - in response to Message 46375. Ah, the expected kind of reply. But I was looking on exactly 1 computer with an N3700 CPU and I clearly wrote: "At any time this advice can be ignored ..." Just in case you want to try it out: Use an app_config.xml to run ATLAS as 3-core VM. Then let this setup run a couple of days and compare it against the 4-core. Without this test nobody knows which setup will be more efficient on that computer. ID: 46376 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46377 - Posted: 28 Feb 2022, 12:46:29 UTC Hmph, you completely misunderstood me. I was simply looking for information. Presumably you've already tested what's best and should be issuing what will be more efficient. If it's better to leave 1 core free, then Atlas tasks should be issued according to this. An 8 core computer should get 7 core tasks. Ok, Boinc maybe makes this impossible to do, or maybe a 1 core task running some other project at the same time negates any benefit. I'm just asking! I'm not having a go, I'm trying to understand how it works and how I can make all my computers do the most work. I can run a test on any or all of my computers if you like, but this information should be held centrally and has probably already been done by someone? Perhaps there's a useful rule for Intel/AMD/HT that could benefit all users? Let me know what you want me to run, I'm quite willing to offer help, you can see my computers here: https://lhcathome.cern.ch/lhcathome/hosts_user.php ID: 46377 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46378 - Posted: 28 Feb 2022, 13:41:43 UTC - in response to Message 46377. As CP mentioned each ATLAS task processes 200 events from a pool. Each event is processed by a thread (a real thread in this case!) and takes between a few seconds and 10..30..60.. min. If a thread has finished an event it starts the next event until the pool is empty. The scientific app running in your VM sets up as many threads as your VM has cores. Setup phase and stage out phase always run on 1 core - the rest of the cores remain idle but keep allocated by VirtualBox for this VM. Long term (this is related to ATLAS only!) - it's more efficient to run an even number of threads (avoids having 1 event left in the pool) - it's more efficient to run less cores per VM but many VMs concurrently (1-4 cores per VM vs. 5-8 cores per VM) Within each range you need to test which setup is the most efficient. Better to expect only minor differences within the same range. Exceptions Running many VMs with few cores concurrently requires lots of RAM. Leave enough spare RAM for the disk cache, the OS and all other processes. VMs that allocate all available cores may run significantly slower (see the VirtualBox advice). ID: 46378 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46379 - Posted: 28 Feb 2022, 13:58:21 UTC - in response to Message 46378. Long term (this is related to ATLAS only!) - it's more efficient to run an even number of threads (avoids having 1 event left in the pool) Thanks for the advice, just one more quick question. Can you explain this one further? If there are 200 events to do, what's the harm in doing 3 at a time? Since they all take different amounts of time, won't you always end up with 1 left no matter how many threads you run? ID: 46379 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46380 - Posted: 28 Feb 2022, 14:22:57 UTC - in response to Message 46379. Last modified: 28 Feb 2022, 14:24:26 UTC It's a question of long term averages since nobody can predict the required processing time per event. 200 events / 2 threads => each thread processes 100 events (average: can also be 98/102 or 97/103 ...) 200 events / 4 threads => each thread processes 50 events (average: can also be <calculate yourself>) 198 events / 3 threads => each thread processes 66 events (average!) which leaves 2 events in the pool. If the pool is empty 1 thread remains idle until the last 2 events are processed. ID: 46380 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46381 - Posted: 28 Feb 2022, 14:30:01 UTC - in response to Message 46380. I disagree. Certainly on the Atlas task I'm running on the slow computer, the time for each event varies widely from 401 to 4825 seconds. So chances are as you approach the end of the pool, no number of threads will make them match up nicely. It's like 200 apples in a box. 4 people are taking them out 1 at a time each, at a random varying speed. You'd be no more likely to have some people idle at the end with 4 people than 3. ID: 46381 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46382 - Posted: 28 Feb 2022, 14:41:14 UTC - in response to Message 46381. I didn't expect that you understand it. ID: 46382 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46383 - Posted: 28 Feb 2022, 14:44:46 UTC - in response to Message 46382. Last modified: 28 Feb 2022, 14:52:52 UTC I didn't expect that you understand it. You're always on the defensive. I'd like you to convince me I'm wrong. This is not about computers or Atlas at all, just stats. I cannot see how after 200 events have occurred, with widely varying times, that it matters how many workers there are. I'm interested in this idea, I've asked it in some maths forums. ID: 46383 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46384 - Posted: 28 Feb 2022, 15:35:07 UTC - in response to Message 46382. I didn't expect that you understand it. In fact, so far that comptuer has done: Worker 1: 30 events Worker 2: 37 events Worker 3: 34 events Worker 4: 39 events That's 140 in total. By the time we get near the end it could be: Worker 1: 43 events Worker 2: 52 events Worker 3: 49 events Worker 4: 55 events This leaves 1 event. But 200 is a multiple of 4.... You cannot predict when each worker will finish with such random event sizes. ID: 46384 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,851,536 RAC: 3,668	Message 46385 - Posted: 28 Feb 2022, 15:48:21 UTC With 4 cores you will have 3 idle ones at the end (we don't know for how long), with 3 cores you'll have 2 idle core at the end .... So the most efficient task would be a single core VM. However, when you want to run more tasks you must have enough memory (3900MB for each task) Another point will be the duration of the task. No problem if your machine runs 24/7. ATLAS (and CMS) don't like interruptions (incl network) for longer periods. ID: 46385 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2276 Credit: 178,176,038 RAC: 114,693	Message 46386 - Posted: 28 Feb 2022, 16:00:59 UTC - in response to Message 46383. Have changed from HT to real Core for all Computer. No separate free Core. All is running well. (CMS, Theory and/or Atlas). For Atlas only two Cores for each task in Windows (10pro and 11pro). Then the difference for the last Collissions for this two Cores is not so different. Yes, the begin and the end of each Atlas-Task need about 8-10 Minutes. CentOs7-VM and CentOS8-VM Atlas are running with one Core for each task. All Computer are using Squid. ID: 46386 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 448 Credit: 12,973,481 RAC: 8,289	Message 46387 - Posted: 28 Feb 2022, 16:03:47 UTC - in response to Message 46385. With 4 cores you will have 3 idle ones at the end (we don't know for how long), with 3 cores you'll have 2 idle core at the end .... So the most efficient task would be a single core VM. However, when you want to run more tasks you must have enough memory (3900MB for each task) Another point will be the duration of the task. No problem if your machine runs 24/7. ATLAS (and CMS) don't like interruptions (incl network) for longer periods. I think I'd rather spend money on more processing power than more RAM. The RAM would be a severe limitation if I ran more Atlas VMs. Even my largest computer has 64GB and 24 threads, so not enough. The 2 dual xeons have 32 and 40GB, but 24 threads each. ID: 46387 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 170 Credit: 15,020,549 RAC: 24	Message 46390 - Posted: 1 Mar 2022, 16:13:26 UTC - in response to Message 46378. As CP mentioned each ATLAS task processes 200 events from a pool. It has struck me before, that changing the task's pool size to 180 or 240 events would give better divisibility. ID: 46390 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,756,266 RAC: 84,097	Message 46391 - Posted: 1 Mar 2022, 16:36:04 UTC - in response to Message 46390. I would vote for 240. This would be perfect for all setups between 1 and 8, except 7. It would also be perfect for ATLAS native running a 12-core setup. ID: 46391 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2276 Credit: 178,176,038 RAC: 114,693	Message 46392 - Posted: 1 Mar 2022, 16:50:55 UTC ATLAS (long simulation) ID: 46392 · Reply Quote

LHC@home