Atlas task slowing right down near the end but still using all cores

Author	Message
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46393 - Posted: 1 Mar 2022, 17:08:18 UTC - in response to Message 46390. As CP mentioned each ATLAS task processes 200 events from a pool. It has struck me before, that changing the task's pool size to 180 or 240 events would give better divisibility. There is no point in this, they are random sizes. You have no idea what will be happening at the end. Each worker does not do them at the same rate, they are random. It doesn't matter if it's divisible. Imagine you're a foreman with several workers. You have 200 jobs that need doing, some take 5 minutes, some half an hour, it's random. Who cares if 200 is divisible by the number of workers? That would only be important if each task took precisely the same amount of time. They don't. At the end of a 6 core Atlas, you'll have 1 idle for an unknown amount of time, 2 idle of an unknown amount of time, then 3, then 4, then 5. If you made two 3 core Atlases, each one would have 1 idle for an unknown amount of time, 2 idle for an unknown amount of time. So pretty much the same. ID: 46393 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,350,712 RAC: 123,309	Message 46394 - Posted: 1 Mar 2022, 17:10:00 UTC I revised my earlier comments and think it can better be explained looking at the modulo (%) results: events 180 200 240 threads events % threads 1 0 0 0 2 0 0 0 3 0 2 0 4 0 0 0 5 0 0 0 6 0 2 0 7 5 4 2 8 4 0 0 12 0 8 0 The values show how many events are left in the pool (long term average!) when the last full series is finished. Nonetheless, on a 4-core CPU a 3-core setup can still be more efficient if the long term averages to process a single event are short enough. This needs to be tested on each computer individually. ID: 46394 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,350,712 RAC: 123,309	Message 46396 - Posted: 1 Mar 2022, 17:13:03 UTC - in response to Message 46393. ... they are random sizes. You are looking on just 1 task but you would have to look at the long term averages. Really huge numbers! ID: 46396 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46397 - Posted: 1 Mar 2022, 17:13:51 UTC - in response to Message 46394. You're bound to come close to the end, with one worker having just taken the last from the pool, and the other workers part way through, at different stages, you will always get idle workers at the end, there's nothing you can do about this. ID: 46397 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46398 - Posted: 1 Mar 2022, 17:15:30 UTC - in response to Message 46396. Last modified: 1 Mar 2022, 17:16:19 UTC ... they are random sizes. You are looking on just 1 task but you would have to look at the long term averages. Really huge numbers! Ok, if you've looked at huge amounts of stats, I guess anything could happen. I'm surprised there's a difference though, considering the wide variance in event times. Was the wide variance I saw unusual? Are events usually pretty much the same length? Also, do you have a figure for how much time is wated? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency. ID: 46398 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191	Message 46399 - Posted: 2 Mar 2022, 1:32:47 UTC - in response to Message 46398. ... they are random sizes. You are looking on just 1 task but you would have to look at the long term averages. Really huge numbers! ... I'm surprised there's a difference though, considering the wide variance in event times. Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads. Was the wide variance I saw unusual? I've no idea. Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency. IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) ID: 46399 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46400 - Posted: 2 Mar 2022, 5:56:17 UTC - in response to Message 46399. ... I'm surprised there's a difference though, considering the wide variance in event times. Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads. Even with the full 8 cores, that's 25 events per thread, which is more than enough to average things out. Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency. IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) A few minutes in 4 hours is nothing. ID: 46400 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191	Message 46402 - Posted: 2 Mar 2022, 21:23:53 UTC - in response to Message 46400. Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads. Even with the full 8 cores, that's 25 events per thread, which is more than enough to average things out. Doesn't that depend on the variance, which I've never studied? In any case, the aim is that the averaging overcomes the variance, which is why divisibility is important for avoiding a small number of events left over. Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency. IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) A few minutes in 4 hours is nothing. Thank you - I did put some effort into setting those machines up... ID: 46402 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191	Message 46403 - Posted: 2 Mar 2022, 21:28:20 UTC - in response to Message 46391. I would vote for 240. Actually, I'd vote for 360 - them Babylonians knew what they were doing - but if people are already struggling with compute times then it would be better to stick to low hanging fruit. :( ID: 46403 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46411 - Posted: 3 Mar 2022, 14:52:57 UTC - in response to Message 46402. Doesn't that depend on the variance, which I've never studied? On the one I looked at there was a factor of 10 in the times for each event. In any case, the aim is that the averaging overcomes the variance, which is why divisibility is important for avoiding a small number of events left over. Surely average overcoming variance would mean it's just as likely to end up with an odd number? Thank you - I did put some effort into setting those machines up... I can't tell if that's sarcastic. ID: 46411 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46412 - Posted: 3 Mar 2022, 14:54:17 UTC - in response to Message 46403. I would vote for 240. Actually, I'd vote for 360 - them Babylonians knew what they were doing - but if people are already struggling with compute times then it would be better to stick to low hanging fruit. :( Are they the ones responsible for clocks? It might divide better, so quarter of an hour is a whole number of minutes, but decimal is so much easier for humans to calculate in their heads, which is why we've pretty much stopped with inches, furlongs, etc. ID: 46412 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2090 Credit: 158,733,549 RAC: 128,321	Message 46413 - Posted: 3 Mar 2022, 15:09:57 UTC - in response to Message 46411. This wrote David in his wishes for this year: 2021 has been another strange and challenging year, but thanks to you all the ATLAS experiment has been able to continue to produce more groundbreaking physics results. This year you simulated a total of 3 billion events! At 200 events per WU that's 15 million WU crunched. To put this into perspective, the total events simulated by all our worldwide computing resources was around 24 billion, so the contribution through LHC@Home is a really significant part of this. ID: 46413 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191	Message 46414 - Posted: 3 Mar 2022, 15:30:48 UTC - in response to Message 46412. I would vote for 240. Actually, I'd vote for 360 - them Babylonians knew what they were doing ... Are they the ones responsible for clocks? ... and angles, (which is where clock faces came from?). But, actually computezrmle was right: 240 also gets you a division by 16, for future expansion. ID: 46414 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46415 - Posted: 3 Mar 2022, 15:36:30 UTC - in response to Message 46413. This wrote David in his wishes for this year: 2021 has been another strange and challenging year, but thanks to you all the ATLAS experiment has been able to continue to produce more groundbreaking physics results. This year you simulated a total of 3 billion events! At 200 events per WU that's 15 million WU crunched. To put this into perspective, the total events simulated by all our worldwide computing resources was around 24 billion, so the contribution through LHC@Home is a really significant part of this. Where are the other 21 billion being done? ID: 46415 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46416 - Posted: 3 Mar 2022, 15:36:56 UTC - in response to Message 46414. Last modified: 3 Mar 2022, 15:38:36 UTC I would vote for 240. Actually, I'd vote for 360 - them Babylonians knew what they were doing ... Are they the ones responsible for clocks? ... and angles, (which is where clock faces came from?). But, actually computezrmle was right: 240 also gets you a division by 16, for future expansion. I'm yet to be convinced it actually matters. 16 cores doing 240 events would likely still end up with half the cores waiting as some events were longer. Only single core ATLAS tasks are efficient, but the amount of RAM used and the amount of disk activity to set them up negate that. ID: 46416 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,522,261 RAC: 15,536	Message 46417 - Posted: 3 Mar 2022, 18:59:04 UTC - in response to Message 46415. Last modified: 3 Mar 2022, 18:59:16 UTC This wrote David in his wishes for this year: 2021 has been another strange and challenging year, but thanks to you all the ATLAS experiment has been able to continue to produce more groundbreaking physics results. This year you simulated a total of 3 billion events! At 200 events per WU that's 15 million WU crunched. To put this into perspective, the total events simulated by all our worldwide computing resources was around 24 billion, so the contribution through LHC@Home is a really significant part of this. Where are the other 21 billion being done? See here: https://lhcathome.cern.ch/lhcathome/atlas_job.php the lower graphics for the past month. ID: 46417 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46419 - Posted: 3 Mar 2022, 21:06:52 UTC - in response to Message 46417. Where are the other 21 billion being done? See here: https://lhcathome.cern.ch/lhcathome/atlas_job.php the lower graphics for the past month. What is Vega? ID: 46419 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,350,712 RAC: 123,309	Message 46420 - Posted: 3 Mar 2022, 21:26:52 UTC - in response to Message 46419. What is Vega? https://indico.cern.ch/event/876794/contributions/4567029/attachments/2327238/3964735/Vega%20GDB.pdf ID: 46420 · Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 418 Credit: 5,667,249 RAC: 6	Message 46421 - Posted: 3 Mar 2022, 21:35:44 UTC - in response to Message 46420. What is Vega? https://indico.cern.ch/event/876794/contributions/4567029/attachments/2327238/3964735/Vega%20GDB.pdf I'd hate to see their electricity bill. Please tell me Atos isn't the same one that made disabled people in the UK commit suicide. ID: 46421 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,350,712 RAC: 123,309	Message 46422 - Posted: 3 Mar 2022, 21:44:23 UTC - in response to Message 46421. I'd hate to see their electricity bill. They hired some cyclists. As a side effect SLO won the Tour de France twice in 2020/2021. ID: 46422 · Reply Quote

LHC@home