Message boards : Number crunching : Day Light Saving time ended and all tasks that were running got aborted by client
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,410,368
RAC: 102,457
Message 37103 - Posted: 28 Oct 2018, 12:47:46 UTC

Also other crunchers, besides me, obviously noticed that all tasks that were running when the time was changed back from 3 o'clock to 2 o'clock last night, got terminated by the client.
In the stderr, it says
2018-10-28 02:02:02 (4912): VM Heartbeat file specified, but missing heartbeat.

I had several Theory and several LHCb tasks running, some of them for many hours. So it was too bad that they stopped.
Why did that happen? Was there no way to avoid this mishap?
ID: 37103 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37106 - Posted: 28 Oct 2018, 16:00:42 UTC - in response to Message 37103.  

I didn't lose any tasks. Because on my hosts the system clock is configured to follow network time rather than local time. I bet your hosts all have system clock configured to follow local time. You can avoid such losses in the future by configuring your host's system clocks to follow network time.
ID: 37106 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 90
Credit: 4,188,598
RAC: 0
Message 37109 - Posted: 28 Oct 2018, 19:32:10 UTC

I had one LHCb task running last night and it had ended with "error while computing". I thought okay... probably just bad luck. Heh, I looked at stderr now and found that same error message. It's good to know the computer time settings was the reason.
ID: 37109 · Report as offensive     Reply Quote

Message boards : Number crunching : Day Light Saving time ended and all tasks that were running got aborted by client


©2024 CERN