Message boards : ATLAS application : Extreme event processing times
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 49705 - Posted: 5 Mar 2024, 11:06:15 UTC

This morning I've several tasks running with the 'normal' 400 events,
but after some normal runtimes, I now have tasks with processing times for each seperate event up to 6700 seconds.
Since the logging from ALT-F2 is still stuck, I've no idea of the average event runtime.
ID: 49705 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 163,623,440
RAC: 157,116
Message 49706 - Posted: 5 Mar 2024, 11:27:35 UTC - in response to Message 49705.  

Crystal,
what do you see under properties of the boincmanager for this Task?
ID: 49706 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,835,021
RAC: 21,332
Message 49707 - Posted: 5 Mar 2024, 11:37:58 UTC - in response to Message 49705.  

This morning I've several tasks running with the 'normal' 400 events,
but after some normal runtimes, I now have tasks with processing times for each seperate event up to 6700 seconds.
Since the logging from ALT-F2 is still stuck, I've no idea of the average event runtime.

I can't access Alt-F2 on my system either. However there may be a way to bypass that, and still get all the info you need.
Are you running Linux on your system? If so, is BOINC running as a system service? If it is, I can give you instructions so you can access the same information using the VirtualBox Manager.
ID: 49707 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 49710 - Posted: 5 Mar 2024, 12:50:11 UTC - in response to Message 49706.  

Crystal,
what do you see under properties of the boincmanager for this Task?


Application
ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
Name
wPiKDmjlS14nsSi4apGgGQJmABFKDmABFKDm8QvSDm4luLDmYCOiSn
State
Running
Received
3/5/2024 7:09:36 AM
Report deadline
3/12/2024 7:09:37 AM
Resources
8 CPUs
Estimated computation size
43,200 GFLOPs
CPU time
1d 05:57:55
CPU time since checkpoint
00:05:59
Elapsed time
04:39:27
Estimated time remaining
03:18:08
Fraction done
58.512%
Virtual memory size
116.62 MB
Working set size
4.69 GB
Directory
slots/0
Process ID
2116
Progress rate
12.600% per hour
Executable
vboxwrapper_26206_windows_x86_64.exe
ID: 49710 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 163,623,440
RAC: 157,116
Message 49711 - Posted: 5 Mar 2024, 13:03:43 UTC - in response to Message 49710.  

Very good,
do waiting for finish. You get a few cobblestones ;-))
ID: 49711 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 49713 - Posted: 5 Mar 2024, 13:10:27 UTC - in response to Message 49711.  

Two of them are ready: 111- and 112-thousand cpu seconds for 400 events
ID: 49713 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 163,623,440
RAC: 157,116
Message 49714 - Posted: 5 Mar 2024, 13:21:00 UTC - in response to Message 49713.  
Last modified: 5 Mar 2024, 13:22:32 UTC

The Threadripper use 6 Cpu's, running in 6 hours with 36 hours Cpu for this heavy Atlas-Tasks.
Something for a Cray.
ID: 49714 · Report as offensive     Reply Quote
S@NL - John van Gorsel

Send message
Joined: 8 Aug 11
Posts: 5
Credit: 2,602,155
RAC: 314
Message 50318 - Posted: 2 Jun 2024, 6:47:44 UTC

I have two tasks that have been running for a very long time as well:

Application
ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
Name
t3iKDmxuLV5nsSi4ap6QjLDmwznN0nGgGQJmpmIZDmPtFKDmAK1NGm
State
Running
Received
zo 26 mei 2024 13:28:26 CEST
Report deadline
zo 02 jun 2024 13:28:26 CEST
Resources
4 CPUs
Estimated computation size
43.200 GFLOPs
CPU time
8d 00:17:57
CPU time since checkpoint
00:00:34
Elapsed time
2d 10:29:09
Estimated time remaining
00:27:09
Fraction done
99,232%
Virtual memory size
5,01 GB
Working set size
4,30 GB
Directory
slots/2
Process ID
2574
Progress rate
1,800% per hour
Executable
vboxwrapper_26206_x86_64-pc-linux-gnu

My question is whether these tasks will ever reach 100%. The progress slowed down significantly and it now takes 50 seconds for a 0.01% step. Without further slow-down it would take another 13 hours to complete and the deadline is in about 5 hours...
ID: 50318 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 50319 - Posted: 2 Jun 2024, 7:44:12 UTC - in response to Message 50318.  
Last modified: 2 Jun 2024, 7:50:39 UTC

My question is whether these tasks will ever reach 100%. The progress slowed down significantly and it now takes 50 seconds for a 0.01% step. Without further slow-down it would take another 13 hours to complete and the deadline is in about 5 hours...
The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console".
When the button is greyed, you first have to install the VirtualBox Extension Pack to be able to use the Console.
When you get a Console popup, you may use the keystrokes ALT-F3 showing the output of linux 'top' command and ALT-F2 to see the ATLAS.Event Progress Monitoring
The latter is a bit garbled every minute, but could give an impression of the progress.
Since you're running a multi 4 core task, the CPU should show almost 400% when running OK.
You have 4 workers, but only 1 is shown in the monitoring, so when it tells you e.g. 80th event for this worker, the real progress is about 80%. Most ATLAS tasks have 400 events, but maybe you got an extreme task.
ID: 50319 · Report as offensive     Reply Quote
S@NL - John van Gorsel

Send message
Joined: 8 Aug 11
Posts: 5
Credit: 2,602,155
RAC: 314
Message 50320 - Posted: 2 Jun 2024, 10:46:24 UTC - in response to Message 50319.  

The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console".
When the button is greyed, you first have to install the VirtualBox Extension Pack to be able to use the Console.
When you get a Console popup, you may use the keystrokes ALT-F3 showing the output of linux 'top' command and ALT-F2 to see the ATLAS.Event Progress Monitoring
The latter is a bit garbled every minute, but could give an impression of the progress.
Since you're running a multi 4 core task, the CPU should show almost 400% when running OK.
You have 4 workers, but only 1 is shown in the monitoring, so when it tells you e.g. 80th event for this worker, the real progress is about 80%. Most ATLAS tasks have 400 events, but maybe you got an extreme task.


Both tasks reached 100%, about an hour before the deadline:
First task:
Run time 2 days 13 hours 25 min 41 sec
CPU time 8 days 10 hours 13 min 37 sec
Second task:
Run time 2 days 5 hours 29 min 14 sec
CPU time 7 days 4 hours 31 min 29 sec

Both tasks validated with a nice credit as well.

I have Virtualbox installed with the Extension Pack but I do not have the option/button "Show VM Console". I installed the Extension Pack after the tasks were already started (and restarted the pc after that) so that might be the reason.
ID: 50320 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1722
Credit: 107,924,646
RAC: 80,760
Message 50321 - Posted: 2 Jun 2024, 12:05:31 UTC - in response to Message 50319.  

Crystal Pellet wrote:
The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console".
When the button is greyed, you first have to install the VirtualBox Extension Pack to be able to use the Console.
When you get a Console popup, you may use the keystrokes ALT-F3 showing the output of linux 'top' command and ALT-F2 to see the ATLAS.Event Progress Monitoring
The latter is a bit garbled every minute, but could give an impression of the progress.
Since you're running a multi 4 core task, the CPU should show almost 400% when running OK.
You have 4 workers, but only 1 is shown in the monitoring, so when it tells you e.g. 80th event for this worker, the real progress is about 80%. Most ATLAS tasks have 400 events, but maybe you got an extreme task.
I just did this for testing purposes - it shows "175th event" - which, according to what you say above, would mean that the real progress is about 175% ???
BTW, this is a 2-core task, and in console F3 the CPU correctly shows about 199%.
ID: 50321 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 163,623,440
RAC: 157,116
Message 50322 - Posted: 2 Jun 2024, 12:58:26 UTC

With this app_config really 10 Atlas-Tasks on a Threadripper with 64 Cores running well:
<app_config>
<app>
<name>ATLAS</name>
<max_concurrent>10</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>6</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 4250</cmdline>
</app_version>
</app_config>
ID: 50322 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 50323 - Posted: 2 Jun 2024, 14:12:38 UTC - in response to Message 50321.  

I just did this for testing purposes - it shows "175th event" - which, according to what you say above, would mean that the real progress is about 175% ???
BTW, this is a 2-core task, and in console F3 the CPU correctly shows about 199%.
Of course when you do not have a 4-core VM, the figures are different. 2-core and 400 events means 200 events for 1 worker is 100%.
2 core = 2 workers and 175 of 200 (200 events for each worker) means about 87% progress more or less.
The workers don't need to be equal for done events. So the progress is an estimation.
I did some 7-core tasks, so about 57 events for 1 worker is 100% progress when the total events is 400.
ID: 50323 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1332
Credit: 8,781,879
RAC: 5,205
Message 50324 - Posted: 2 Jun 2024, 14:31:12 UTC - in response to Message 50320.  
Last modified: 2 Jun 2024, 14:34:57 UTC

Both tasks reached 100%, about an hour before the deadline:
First task:
Run time 2 days 13 hours 25 min 41 sec
CPU time 8 days 10 hours 13 min 37 sec
Both tasks had several restarts looking at the process ID's. Probably the tasks sometimes started from the very beginning.
2024-05-31 21:56:57 (2914): VM state change detected. (old = 'running', new = 'paused')
2024-06-01 09:07:49 (2630): Detected: vboxwrapper 26206
2024-06-01 12:25:52 (2630): Status Report: CPU Time: '472779.010000'
2024-06-01 14:33:47 (2583): Detected: vboxwrapper 26206
2024-06-01 22:33:03 (2583): VM state change detected. (old = 'paused', new = 'running')
2024-06-01 22:35:44 (2574): Detected: vboxwrapper 26206


Second task:
Run time 2 days 5 hours 29 min 14 sec
CPU time 7 days 4 hours 31 min 29 sec

2024-05-31 21:56:57 (2916): VM state change detected. (old = 'running', new = 'paused')
2024-06-01 09:07:49 (2631): Detected: vboxwrapper 26206
2024-06-01 12:25:52 (2631): Status Report: CPU Time: '449457.020000'
2024-06-01 14:33:47 (2584): Detected: vboxwrapper 26206
2024-06-01 17:26:38 (2584): VM state change detected. (old = 'running', new = 'paused')
2024-06-01 22:36:03 (4574): Detected: vboxwrapper 26206

The first interruption was over 11 hours. ATLAS (and CMS) need an uninterrupted internet connection, so long suspensions are killing.
ID: 50324 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1722
Credit: 107,924,646
RAC: 80,760
Message 50325 - Posted: 2 Jun 2024, 15:36:52 UTC - in response to Message 50323.  

Of course when you do not have a 4-core VM, the figures are different...
okay, I now got it :-) thanks for the clarification !
ID: 50325 · Report as offensive     Reply Quote

Message boards : ATLAS application : Extreme event processing times


©2024 CERN