Message boards :
ATLAS application :
Extreme event processing times
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
This morning I've several tasks running with the 'normal' 400 events, but after some normal runtimes, I now have tasks with processing times for each seperate event up to 6700 seconds. Since the logging from ALT-F2 is still stuck, I've no idea of the average event runtime. |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
Crystal, what do you see under properties of the boincmanager for this Task? |
Send message Joined: 4 Sep 22 Posts: 95 Credit: 16,660,760 RAC: 2,981 ![]() ![]() ![]() |
This morning I've several tasks running with the 'normal' 400 events, I can't access Alt-F2 on my system either. However there may be a way to bypass that, and still get all the info you need. Are you running Linux on your system? If so, is BOINC running as a system service? If it is, I can give you instructions so you can access the same information using the VirtualBox Manager. |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
Crystal, Application ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) Name wPiKDmjlS14nsSi4apGgGQJmABFKDmABFKDm8QvSDm4luLDmYCOiSn State Running Received 3/5/2024 7:09:36 AM Report deadline 3/12/2024 7:09:37 AM Resources 8 CPUs Estimated computation size 43,200 GFLOPs CPU time 1d 05:57:55 CPU time since checkpoint 00:05:59 Elapsed time 04:39:27 Estimated time remaining 03:18:08 Fraction done 58.512% Virtual memory size 116.62 MB Working set size 4.69 GB Directory slots/0 Process ID 2116 Progress rate 12.600% per hour Executable vboxwrapper_26206_windows_x86_64.exe |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
Very good, do waiting for finish. You get a few cobblestones ;-)) |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
Two of them are ready: 111- and 112-thousand cpu seconds for 400 events |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
The Threadripper use 6 Cpu's, running in 6 hours with 36 hours Cpu for this heavy Atlas-Tasks. Something for a Cray. |
Send message Joined: 8 Aug 11 Posts: 5 Credit: 2,612,858 RAC: 0 ![]() ![]() |
I have two tasks that have been running for a very long time as well: Application ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) Name t3iKDmxuLV5nsSi4ap6QjLDmwznN0nGgGQJmpmIZDmPtFKDmAK1NGm State Running Received zo 26 mei 2024 13:28:26 CEST Report deadline zo 02 jun 2024 13:28:26 CEST Resources 4 CPUs Estimated computation size 43.200 GFLOPs CPU time 8d 00:17:57 CPU time since checkpoint 00:00:34 Elapsed time 2d 10:29:09 Estimated time remaining 00:27:09 Fraction done 99,232% Virtual memory size 5,01 GB Working set size 4,30 GB Directory slots/2 Process ID 2574 Progress rate 1,800% per hour Executable vboxwrapper_26206_x86_64-pc-linux-gnu My question is whether these tasks will ever reach 100%. The progress slowed down significantly and it now takes 50 seconds for a 0.01% step. Without further slow-down it would take another 13 hours to complete and the deadline is in about 5 hours... |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
My question is whether these tasks will ever reach 100%. The progress slowed down significantly and it now takes 50 seconds for a 0.01% step. Without further slow-down it would take another 13 hours to complete and the deadline is in about 5 hours...The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console". When the button is greyed, you first have to install the VirtualBox Extension Pack to be able to use the Console. When you get a Console popup, you may use the keystrokes ALT-F3 showing the output of linux 'top' command and ALT-F2 to see the ATLAS.Event Progress Monitoring The latter is a bit garbled every minute, but could give an impression of the progress. Since you're running a multi 4 core task, the CPU should show almost 400% when running OK. You have 4 workers, but only 1 is shown in the monitoring, so when it tells you e.g. 80th event for this worker, the real progress is about 80%. Most ATLAS tasks have 400 events, but maybe you got an extreme task. |
Send message Joined: 8 Aug 11 Posts: 5 Credit: 2,612,858 RAC: 0 ![]() ![]() |
The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console". Both tasks reached 100%, about an hour before the deadline: First task: Run time 2 days 13 hours 25 min 41 sec CPU time 8 days 10 hours 13 min 37 sec Second task: Run time 2 days 5 hours 29 min 14 sec CPU time 7 days 4 hours 31 min 29 sec Both tasks validated with a nice credit as well. I have Virtualbox installed with the Extension Pack but I do not have the option/button "Show VM Console". I installed the Extension Pack after the tasks were already started (and restarted the pc after that) so that might be the reason. |
Send message Joined: 18 Dec 15 Posts: 1840 Credit: 126,171,519 RAC: 123,164 ![]() ![]() ![]() |
Crystal Pellet wrote: The progress shown by BOINC Manager is worthless. When you highlight the running ATLAS-task in BOINC Manager there is button on the left: "Show VM Console".I just did this for testing purposes - it shows "175th event" - which, according to what you say above, would mean that the real progress is about 175% ??? BTW, this is a 2-core task, and in console F3 the CPU correctly shows about 199%. |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
With this app_config really 10 Atlas-Tasks on a Threadripper with 64 Cores running well: <app_config> <app> <name>ATLAS</name> <max_concurrent>10</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <avg_ncpus>6</avg_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 4250</cmdline> </app_version> </app_config> |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
I just did this for testing purposes - it shows "175th event" - which, according to what you say above, would mean that the real progress is about 175% ???Of course when you do not have a 4-core VM, the figures are different. 2-core and 400 events means 200 events for 1 worker is 100%. 2 core = 2 workers and 175 of 200 (200 events for each worker) means about 87% progress more or less. The workers don't need to be equal for done events. So the progress is an estimation. I did some 7-core tasks, so about 57 events for 1 worker is 100% progress when the total events is 400. |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
Both tasks reached 100%, about an hour before the deadline:Both tasks had several restarts looking at the process ID's. Probably the tasks sometimes started from the very beginning. 2024-05-31 21:56:57 (2914): VM state change detected. (old = 'running', new = 'paused') 2024-06-01 09:07:49 (2630): Detected: vboxwrapper 26206 2024-06-01 12:25:52 (2630): Status Report: CPU Time: '472779.010000' 2024-06-01 14:33:47 (2583): Detected: vboxwrapper 26206 2024-06-01 22:33:03 (2583): VM state change detected. (old = 'paused', new = 'running') 2024-06-01 22:35:44 (2574): Detected: vboxwrapper 26206 Second task: 2024-05-31 21:56:57 (2916): VM state change detected. (old = 'running', new = 'paused') 2024-06-01 09:07:49 (2631): Detected: vboxwrapper 26206 2024-06-01 12:25:52 (2631): Status Report: CPU Time: '449457.020000' 2024-06-01 14:33:47 (2584): Detected: vboxwrapper 26206 2024-06-01 17:26:38 (2584): VM state change detected. (old = 'running', new = 'paused') 2024-06-01 22:36:03 (4574): Detected: vboxwrapper 26206 The first interruption was over 11 hours. ATLAS (and CMS) need an uninterrupted internet connection, so long suspensions are killing. |
Send message Joined: 18 Dec 15 Posts: 1840 Credit: 126,171,519 RAC: 123,164 ![]() ![]() ![]() |
Of course when you do not have a 4-core VM, the figures are different...okay, I now got it :-) thanks for the clarification ! |
![]() Send message Joined: 28 May 16 Posts: 1 Credit: 4,467,705 RAC: 3,105 ![]() ![]() ![]() |
I today aborted three ATLAS Simulation never-ending tasks. After some sporadic Internet connection outages at my zone, these tasks reached 100% progress, but they didn't finish on three different Linux hosts after 381,715.33, 452,779.59, and 807,403.00 seconds respectively. I don't know if an automatic end is expected to happen after a certain period (?) I thought that 807,403.00 seconds (more than 9 days and 8 hours) was enough time to give the chance... |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,607 RAC: 1,253 ![]() ![]() |
I don't know if an automatic end is expected to happen after a certain period (?)For ATLAS is no automatic end set, but using only CPU time 54 min 33 sec in over 9 days is enough sign, that the task is not doing well. |
©2025 CERN