Message boards :
ATLAS application :
Atlas task slowing right down near the end but still using all cores - continue?
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
There is no point in this, they are random sizes. You have no idea what will be happening at the end. Each worker does not do them at the same rate, they are random. It doesn't matter if it's divisible. Imagine you're a foreman with several workers. You have 200 jobs that need doing, some take 5 minutes, some half an hour, it's random. Who cares if 200 is divisible by the number of workers? That would only be important if each task took precisely the same amount of time. They don't. At the end of a 6 core Atlas, you'll have 1 idle for an unknown amount of time, 2 idle of an unknown amount of time, then 3, then 4, then 5. If you made two 3 core Atlases, each one would have 1 idle for an unknown amount of time, 2 idle for an unknown amount of time. So pretty much the same.As CP mentioned each ATLAS task processes 200 events from a pool.It has struck me before, that changing the task's pool size to 180 or 240 events would give better divisibility. |
Send message Joined: 15 Jun 08 Posts: 2473 Credit: 245,701,514 RAC: 83,419 |
I revised my earlier comments and think it can better be explained looking at the modulo (%) results: events 180 200 240 threads events % threads 1 0 0 0 2 0 0 0 3 0 2 0 4 0 0 0 5 0 0 0 6 0 2 0 7 5 4 2 8 4 0 0 12 0 8 0 The values show how many events are left in the pool (long term average!) when the last full series is finished. Nonetheless, on a 4-core CPU a 3-core setup can still be more efficient if the long term averages to process a single event are short enough. This needs to be tested on each computer individually. |
Send message Joined: 15 Jun 08 Posts: 2473 Credit: 245,701,514 RAC: 83,419 |
... they are random sizes. You are looking on just 1 task but you would have to look at the long term averages. Really huge numbers! |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
You're bound to come close to the end, with one worker having just taken the last from the pool, and the other workers part way through, at different stages, you will always get idle workers at the end, there's nothing you can do about this. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
Ok, if you've looked at huge amounts of stats, I guess anything could happen. I'm surprised there's a difference though, considering the wide variance in event times. Was the wide variance I saw unusual? Are events usually pretty much the same length?... they are random sizes. Also, do you have a figure for how much time is wated? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency. |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 14,978,870 RAC: 18 |
Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads.... I'm surprised there's a difference though, considering the wide variance in event times.... they are random sizes.You are looking on just 1 task but you would have to look at the long term averages. Was the wide variance I saw unusual?I've no idea. Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency.IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
Even with the full 8 cores, that's 25 events per thread, which is more than enough to average things out.... I'm surprised there's a difference though, considering the wide variance in event times.Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads. A few minutes in 4 hours is nothing.Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency.IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 14,978,870 RAC: 18 |
Doesn't that depend on the variance, which I've never studied? In any case, the aim is that the averaging overcomes the variance, which is why divisibility is important for avoiding a small number of events left over.Even within a task, as the number of threads is reduced then each thread must run more events, and it's more likely - but not guaranteed - that they will average out across the threads.Even with the full 8 cores, that's 25 events per thread, which is more than enough to average things out. Thank you - I did put some effort into setting those machines up...A few minutes in 4 hours is nothing.Also, do you have a figure for how much time is wasted? Since it's 200 in the pool, the wasted cores at the end are probably a fraction of a percent of inefficiency.IIRC, back when I was running 8-core native Atlas I would generally see the active threads reduce over usually 1-2 minutes, 5 if slow, for tasks of about 4 hrs total wall-clock. (There might be numbers in some ancient post here, but the laptop's tired tonight :( ) |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 14,978,870 RAC: 18 |
I would vote for 240.Actually, I'd vote for 360 - them Babylonians knew what they were doing - but if people are already struggling with compute times then it would be better to stick to low hanging fruit. :( |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
Doesn't that depend on the variance, which I've never studied?On the one I looked at there was a factor of 10 in the times for each event. In any case, the aim is that the averaging overcomes the variance, which is why divisibility is important for avoiding a small number of events left over.Surely average overcoming variance would mean it's just as likely to end up with an odd number? Thank you - I did put some effort into setting those machines up...I can't tell if that's sarcastic. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
Are they the ones responsible for clocks? It might divide better, so quarter of an hour is a whole number of minutes, but decimal is so much easier for humans to calculate in their heads, which is why we've pretty much stopped with inches, furlongs, etc.I would vote for 240.Actually, I'd vote for 360 - them Babylonians knew what they were doing - but if people are already struggling with compute times then it would be better to stick to low hanging fruit. :( |
Send message Joined: 2 May 07 Posts: 2176 Credit: 172,365,562 RAC: 93,649 |
This wrote David in his wishes for this year: 2021 has been another strange and challenging year, but thanks to you all the ATLAS experiment has been able to continue to produce more groundbreaking physics results. This year you simulated a total of 3 billion events! At 200 events per WU that's 15 million WU crunched. To put this into perspective, the total events simulated by all our worldwide computing resources was around 24 billion, so the contribution through LHC@Home is a really significant part of this. |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 14,978,870 RAC: 18 |
... and angles, (which is where clock faces came from?).Are they the ones responsible for clocks?I would vote for 240.Actually, I'd vote for 360 - them Babylonians knew what they were doing ... But, actually computezrmle was right: 240 also gets you a division by 16, for future expansion. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
This wrote David in his wishes for this year:Where are the other 21 billion being done? |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
I'm yet to be convinced it actually matters. 16 cores doing 240 events would likely still end up with half the cores waiting as some events were longer. Only single core ATLAS tasks are efficient, but the amount of RAM used and the amount of disk activity to set them up negate that.... and angles, (which is where clock faces came from?).Are they the ones responsible for clocks?I would vote for 240.Actually, I'd vote for 360 - them Babylonians knew what they were doing ... |
Send message Joined: 28 Sep 04 Posts: 704 Credit: 46,807,965 RAC: 33,700 |
This wrote David in his wishes for this year:Where are the other 21 billion being done? See here: https://lhcathome.cern.ch/lhcathome/atlas_job.php the lower graphics for the past month. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
What is Vega?Where are the other 21 billion being done?See here: https://lhcathome.cern.ch/lhcathome/atlas_job.php the lower graphics for the past month. |
Send message Joined: 15 Jun 08 Posts: 2473 Credit: 245,701,514 RAC: 83,419 |
|
Send message Joined: 12 Aug 06 Posts: 429 Credit: 9,870,155 RAC: 39,604 |
I'd hate to see their electricity bill. Please tell me Atos isn't the same one that made disabled people in the UK commit suicide.What is Vega?https://indico.cern.ch/event/876794/contributions/4567029/attachments/2327238/3964735/Vega%20GDB.pdf |
Send message Joined: 15 Jun 08 Posts: 2473 Credit: 245,701,514 RAC: 83,419 |
I'd hate to see their electricity bill. They hired some cyclists. As a side effect SLO won the Tour de France twice in 2020/2021. |
©2024 CERN