Message boards : ATLAS application : Pilot has decided to kill looping job from restored VM
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 32675 - Posted: 7 Oct 2017, 14:13:47 UTC

I suspended an ATLAS-VM overnight and restored it this morning.
It ended very quickly, BOINC validated OK, but there was no valid ATLAS HITS-file uploaded.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=158750724

It seems that the server connected to the VM decided to kill the VM-job: Pilot has decided to kill looping job 3637921913 at 2017-10-07T07:23:41+-100

Why??
ID: 32675 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,442,784
RAC: 103,083
Message 32686 - Posted: 8 Oct 2017, 4:59:42 UTC - in response to Message 32675.  

Pilot has decided to kill looping job 3637921913 at 2017-10-07T07:23:41+-100[/i]

the interesting part of this is that the notice talks about a "looping job".

So, the question may be: is a suspended task considered to run in a loop and therefore terminaded after some time?
ID: 32686 · Report as offensive     Reply Quote

Message boards : ATLAS application : Pilot has decided to kill looping job from restored VM


©2024 CERN