Message boards : ATLAS application : Long running tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 553
Credit: 336,027,745
RAC: 646,271
Message 39528 - Posted: 8 Aug 2019, 16:03:27 UTC

It seems like ATLAS has changed to a longer running tasks?
ID: 39528 · Report as offensive     Reply Quote
newman

Send message
Joined: 16 May 08
Posts: 4
Credit: 879,649
RAC: 999
Message 39531 - Posted: 8 Aug 2019, 18:09:09 UTC - in response to Message 39528.  
Last modified: 8 Aug 2019, 18:33:04 UTC

yea I have also one already running more then 24 h. Is that normal?

Greetings
Marcus
ID: 39531 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 1107
Credit: 53,188,771
RAC: 129,663
Message 39533 - Posted: 8 Aug 2019, 19:14:08 UTC - in response to Message 39531.  

... Is that normal?

Yes.
At least for some parameter sets.

It looks like there are different types of tasks in the queue.
Some with shorter runtimes, others with longer runtimes.

Everything is fine as long as the tasks finish successfully.
Just let them run.
ID: 39533 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 279
Credit: 8,712,351
RAC: 6,221
Message 39539 - Posted: 9 Aug 2019, 10:22:55 UTC - in response to Message 39533.  

Indeed there are longer tasks in the system now. On the image on this post you can see the different bunches of tasks. The larger ones at the bottom are the older shorter tasks and the smaller ones are the new longer tasks.

The reason is that the kind of physics being simulated is different, with more complicated particle interactions in the new tasks which require more CPU time to process. In terms of the physics, the previous tasks consisted of simulating leptons (electrons and muons) which make nice clean tracks through the detector. The new tasks simulate hadrons (particles made up of quarks) and when they interact with the detector they produce "jets" of particles which are much more complex to simulate.
ID: 39539 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 715
Credit: 26,462,343
RAC: 27,189
Message 39540 - Posted: 9 Aug 2019, 11:22:10 UTC

ID: 39540 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 553
Credit: 336,027,745
RAC: 646,271
Message 39543 - Posted: 9 Aug 2019, 16:07:28 UTC

Thanks for the details, I was worried when I saw one more than 24hrs.

hadrons make a mess ;)
ID: 39543 · Report as offensive     Reply Quote
newman

Send message
Joined: 16 May 08
Posts: 4
Credit: 879,649
RAC: 999
Message 39555 - Posted: 10 Aug 2019, 7:01:12 UTC - in response to Message 39543.  

:(

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=120012872

Validate error after 252,537.20 CPU time. Too many errors (may have bug)
ID: 39555 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 715
Credit: 26,462,343
RAC: 27,189
Message 39557 - Posted: 10 Aug 2019, 7:58:44 UTC - in response to Message 39555.  
Last modified: 10 Aug 2019, 7:59:47 UTC

Your Atlas-task was starting and stopped a few times.
Atlas begin from the start after the task is stopped.
You can see how quick (2 hours with 12 CPUs) the task was finished from an other user.
It is very useful in Atlas to run the task from the begin to the end without interruption, normally.
ID: 39557 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 1107
Credit: 53,188,771
RAC: 129,663
Message 39563 - Posted: 10 Aug 2019, 13:30:54 UTC - in response to Message 39555.  

That's bad luck.
At least the fact that 3 (!) of your wingmen misconfigured their computers (missing CVMFS) which caused their tasks to fail.

In addition - as maeax already pointed out - your own task has been suspended a couple of times for several hours each.
Unlike ATLAS native ATLAS vbox should not restart from the scratch if the VM has written a snapshot but you might have hit a maximum runtime limit.
2019-08-04 21:14:16 (2824): vboxwrapper (7.7.26196): starting

2019-08-05 00:24:51 (2824): Successfully stopped VM.
2019-08-05 16:35:47 (4544): vboxwrapper (7.7.26196): starting

2019-08-06 00:43:38 (4544): Successfully stopped VM.
2019-08-06 07:08:23 (9120): vboxwrapper (7.7.26196): starting

2019-08-06 07:12:47 (9120): Guest Log: Starting ATLAS job. (PandaID=4437421466 taskID=18722495)
2019-08-06 18:51:47 (9876): vboxwrapper (7.7.26196): starting

2019-08-07 00:18:56 (9876): Successfully stopped VM.
2019-08-07 07:00:19 (9332): vboxwrapper (7.7.26196): starting

2019-08-07 07:40:16 (9332): Stopping VM.
2019-08-07 07:40:16 (9332): Error 0x80070005 in vbox51::VBOX_VM::stop (c:\src\boinc\boinc\samples\vboxwrapper\vbox_mscom_impl.cpp:1449)
2019-08-07 07:40:16 (9332): Error Source     : SessionMachine
2019-08-07 07:40:16 (9332): Error Description: The object is not ready
2019-08-07 17:50:25 (6120): vboxwrapper (7.7.26196): starting

2019-08-08 00:20:30 (6120): Successfully stopped VM.
2019-08-08 06:28:17 (5324): vboxwrapper (7.7.26196): starting

2019-08-08 07:01:44 (5324): Successfully stopped VM.
2019-08-08 18:21:13 (10020): vboxwrapper (7.7.26196): starting

2019-08-09 00:05:55 (10020): Successfully stopped VM.
2019-08-09 07:04:36 (7756): vboxwrapper (7.7.26196): starting

2019-08-09 07:31:37 (7756): Successfully stopped VM.
2019-08-09 14:48:36 (5900): vboxwrapper (7.7.26196): starting

2019-08-10 01:46:32 (5900): VM Completion File Detected.

01:46:44 (5900): called boinc_finish(0)
ID: 39563 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 553
Credit: 336,027,745
RAC: 646,271
Message 39630 - Posted: 17 Aug 2019, 21:28:39 UTC

Some of my tasks went past the deadline so I aborted them. They were at 7-8 days
ID: 39630 · Report as offensive     Reply Quote

Message boards : ATLAS application : Long running tasks


©2019 CERN