Message boards : ATLAS application : Long event runtimes for 2000 eventers
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,427,836
RAC: 130,169
Message 48672 - Posted: 28 Sep 2023, 6:04:32 UTC

Got one of the rare 2000 event tasks (a resend) running in slot 2.
Compared to slots 0/1 (400 eventers) the average runtime per event is significantly higher.

slots/0/PanDA_Pilot-5972920516/log.EVNTtoHITS
(209th event for this worker) took 79.89 s. New average 172.1 +- 5.887
(191th event for this worker) took 115.8 s. New average 187.8 +- 7.3

slots/1/PanDA_Pilot-5972641056/log.EVNTtoHITS
(87th event for this worker) took 66.32 s. New average 175.2 +- 10.05
(89th event for this worker) took 133.6 s. New average 169.9 +- 8.096

slots/2/PanDA_Pilot-5961031867/log.EVNTtoHITS
(162th event for this worker) took 744.8 s. New average 706.2 +- 8.818
(160th event for this worker) took 546.7 s. New average 710.3 +- 8.53


Another 2000 eventer a week ago also had averages above 700 s but that tasks failed after ~3 days.
ID: 48672 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48674 - Posted: 28 Sep 2023, 7:05:18 UTC

I have a question, although not directly related to the long eventers:

What concerns the amount of RAM for multi-core ATLAS tasks, long time ago there was published this formula:
3900MB for 1-core, plus 900MB for each additional core.
I think that I saw a posting somewhere here some time ago saying that with the new type of ATLAS tasks, this formula is no longer relevant. Isn't there now a fixed amount of MB for a given task, regardless of the number of cores?
Can anybody please enlighten me?
ID: 48674 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,427,836
RAC: 130,169
Message 48675 - Posted: 28 Sep 2023, 7:21:50 UTC - in response to Message 48674.  

So far a qualified answer to both posts can only be given by somebody who is familiar with the internal structure of ATLAS 3.x respectively with the parameters used to create the few 2000 eventers still being around.

Looks like ATM LRZ-LMU might be the only one being able to answer this.
ID: 48675 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,427,844
RAC: 265,611
Message 48677 - Posted: 28 Sep 2023, 15:54:24 UTC - in response to Message 48674.  

The new ones should be less.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47977
ID: 48677 · Report as offensive     Reply Quote
LRZ-LMU

Send message
Joined: 4 May 17
Posts: 5
Credit: 118,785,284
RAC: 0
Message 48692 - Posted: 29 Sep 2023, 13:52:29 UTC - in response to Message 48677.  

23.0.31 (as 23.0.19) has significantly less memory usage. The reason is that it is properly multi-threaded rather then the poor man approach of 21.0.15.This was multi-process where the spawned processes all used the same copy of read-only RAM, but still used some RAM themselves.

There should be no more 2000 events tasks, only 400. Since you chewed through the 35Mevt task in a week, I just got another 30Mevt assigned. There will be a few hours with no jobs while the input EVNT are merged.
ID: 48692 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,427,836
RAC: 130,169
Message 48695 - Posted: 29 Sep 2023, 16:34:43 UTC - in response to Message 48692.  

@LRZ-LMU

Got some fresh(?) tasks with 1.5 GB EVNT files each.
Although there are volunteers who can deal with this others can't.
Hence, it would be nice if you could limit that to the usual 200-400 MB per file we had before.
ID: 48695 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48697 - Posted: 29 Sep 2023, 18:11:46 UTC - in response to Message 48692.  

23.0.31 (as 23.0.19) has significantly less memory usage. The reason is that it is properly multi-threaded rather then the poor man approach of 21.0.15.This was multi-process where the spawned processes all used the same copy of read-only RAM, but still used some RAM themselves.
is there still some kind of rule how much RAM should be assigned for 1-core tasks, 2-core tasks and so on, via app_config.xml ?
ID: 48697 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 677
Credit: 43,759,057
RAC: 14,785
Message 48699 - Posted: 29 Sep 2023, 22:03:03 UTC - in response to Message 48697.  

23.0.31 (as 23.0.19) has significantly less memory usage. The reason is that it is properly multi-threaded rather then the poor man approach of 21.0.15.This was multi-process where the spawned processes all used the same copy of read-only RAM, but still used some RAM themselves.
is there still some kind of rule how much RAM should be assigned for 1-core tasks, 2-core tasks and so on, via app_config.xml ?

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47977#47977 See this message.
ID: 48699 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 677
Credit: 43,759,057
RAC: 14,785
Message 48700 - Posted: 29 Sep 2023, 22:05:04 UTC

There are quite a lot of re-sends of the failed 2000 eventers in circulation, so be aware.
ID: 48700 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48703 - Posted: 30 Sep 2023, 3:53:06 UTC - in response to Message 48700.  

There are quite a lot of re-sends of the failed 2000 eventers in circulation, so be aware.
yesterday, one of these on my machine was aborted by server after about 36 hours CPU time :-(
ID: 48703 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48704 - Posted: 30 Sep 2023, 6:00:13 UTC - in response to Message 48699.  

23.0.31 (as 23.0.19) has significantly less memory usage. The reason is that it is properly multi-threaded rather then the poor man approach of 21.0.15.This was multi-process where the spawned processes all used the same copy of read-only RAM, but still used some RAM themselves.
is there still some kind of rule how much RAM should be assigned for 1-core tasks, 2-core tasks and so on, via app_config.xml ?

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47977#47977 See this message.
well, this message tells exactly what I was referring to in my message above. But obviously, this is no longer valid, is it?
ID: 48704 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 677
Credit: 43,759,057
RAC: 14,785
Message 48708 - Posted: 30 Sep 2023, 9:12:57 UTC - in response to Message 48704.  

Well, I'm running the 400 and 2000 event tasks with setting of -- nthreads 4 --memory_size_mb 4400 without a problem.
ID: 48708 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2120
Credit: 159,926,969
RAC: 70,085
Message 48709 - Posted: 30 Sep 2023, 9:31:34 UTC - in response to Message 48708.  

+1
ID: 48709 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,427,844
RAC: 265,611
Message 48710 - Posted: 30 Sep 2023, 11:37:38 UTC

David said>

On the development system we tested the new application with much lower memory and it was working fine even with 3000MB RAM,
but here for safety it is set to 4000MB.
ID: 48710 · Report as offensive     Reply Quote

Message boards : ATLAS application : Long event runtimes for 2000 eventers


©2024 CERN