Message boards : Theory Application : Day Light Saving time ended and all Theory tasks that were running got aborted by client
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,636,002
RAC: 16,005
Message 37101 - Posted: 28 Oct 2018, 11:43:47 UTC
Last modified: 28 Oct 2018, 11:44:59 UTC

During the night daylight saving time ended and clocks were adjusted 1 hour back at 4:00 o'clock. All Theory tasks that were active at the time were aborted by the client. Below a section of stderr shows what happened.
2018-10-28 03:06:07 (6268): Guest Log: [INFO] New Job Starting in slot1

2018-10-28 03:06:07 (6268): Guest Log: [INFO] Condor JobID:  477556.291 in slot1

2018-10-28 03:06:12 (6268): Guest Log: [INFO] MCPlots JobID: 46980559 in slot1

2018-10-28 03:55:25 (6268): Guest Log: [INFO] Job finished in slot1 with 0.

2018-10-28 03:56:02 (6268): Guest Log: [INFO] New Job Starting in slot1

2018-10-28 03:56:02 (6268): Guest Log: [INFO] Condor JobID:  477846.199 in slot1

2018-10-28 03:56:08 (6268): Guest Log: [INFO] MCPlots JobID: 47001352 in slot1

2018-10-28 03:07:59 (6268): VM Heartbeat file specified, but missing heartbeat.
2018-10-28 03:07:59 (6268): Capturing screenshot.
2018-10-28 03:08:00 (6268): Screenshot completed.
2018-10-28 03:08:00 (6268): Powering off VM.
2018-10-28 03:13:01 (6268): VM did not power off when requested.
2018-10-28 03:13:01 (6268): VM was successfully terminated.
2018-10-28 03:13:01 (6268): Deregistering VM. (boinc_fbe6af95e6090fad, slot#3)
2018-10-28 03:13:01 (6268): Removing network bandwidth throttle group from VM.
2018-10-28 03:13:01 (6268): Removing VM from VirtualBox.


[edit] Atlas tasks were not affected
ID: 37101 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,738,042
RAC: 122,037
Message 37102 - Posted: 28 Oct 2018, 12:42:55 UTC - in response to Message 37101.  

not only Theory tasks were terminated.

I had also quite a number of LHCb tasks running, they were terminated, too :-(

(It's due time to end this nonsense of chaning the time twice a year)
ID: 37102 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 37104 - Posted: 28 Oct 2018, 14:34:47 UTC
Last modified: 28 Oct 2018, 14:35:26 UTC

Have voted against this summertime-arrangement. More than 4 Mio. EU-Citizens have voted.
Hoping this is for next year history.
Yes, Atlas had no problems.
ID: 37104 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37105 - Posted: 28 Oct 2018, 15:47:04 UTC

I didn't lose any tasks last night, neither Theory nor ATLAS. As for LHCb... well... I no longer waste resources on LHCb.

Why was I immune? I bet it has something to do with system clocks on my hosts being set to network time rather than local time. When local time changed last night, systems using local clock shifted back 1 hour therefore the heartbeat appeared to be 1 hour late therefore BOINC terminated tasks.

Network time does NOT change twice a year, so last night the heartbeat on my systems did NOT appear to be 1 hour late so BOINC did NOT terminate tasks.

Seem to recall Windoze systems are by default configured with system clock set to local time with option to set to network time. Linux systems by default configure system clock to follow network time. All my hosts are Linux.

Nothing wrong with daylight savings when ya know how to work a clock :)

Prognostication: When local clocks leap ahead in the spring any host with system clock set to follow local clock will again have tasks cancelled because BOINC will freak when it sees a heartbeat from a task that appears to have travelled 1 hour back in time. Those who don't learn from the past are doomed to repeat their mistakes in the future.

Why did Harri's ATLAS tasks not get terminated? Because he runs them on a host that has system clock configured to follow network time? His theory tasks failed because they run on a host with system clock set to follow local clock?
ID: 37105 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,636,002
RAC: 16,005
Message 37112 - Posted: 28 Oct 2018, 21:26:42 UTC - in response to Message 37105.  

Both my machines at home did change their clocks (win7 & win10), so no network time in use here. I don't see network time setting available, at least not with that name.
ID: 37112 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1127
Credit: 49,749,878
RAC: 9,669
Message 37113 - Posted: 28 Oct 2018, 21:29:38 UTC
Last modified: 28 Oct 2018, 21:31:29 UTC

ID: 37113 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37116 - Posted: 28 Oct 2018, 22:40:09 UTC - in response to Message 37112.  

Both my machines at home did change their clocks (win7 & win10), so no network time in use here. I don't see network time setting available, at least not with that name.


Both clocks changed but ATLAS tasks didn't fail... interesting. Perhaps another cause than the one I have suggested.

I am probably using the wrong name. I wish I could help you find the setting but I don't have any Win machines here to explore, just Linux. It's been several years since I've done much with Windoze. As I recall the setting is buried ~20 clicks deep to discourage users from playing with it.
ID: 37116 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37117 - Posted: 28 Oct 2018, 22:48:07 UTC - in response to Message 37113.  
Last modified: 28 Oct 2018, 23:02:31 UTC

Well then what's going on? I just got off the phone with Toddler Tyrant, he claims he has proof it's because Killary is messing with the Euro clocks.

<edit>
He just called me back. Says he didn't say that. Says he knows a guy who knows a guy who says his dog has the proof and he insists it's not the same guy who photoshopped the pics of his inauguration crowd.
ID: 37117 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 91
Credit: 4,188,598
RAC: 0
Message 37118 - Posted: 28 Oct 2018, 23:18:35 UTC

On Windows 10 (17134.376) Internet time setting can be found here:

1. Right-click clock on the Taskbar
2. Click 'Adjust date/time'
3. Click 'Additional date, time & regional settings'
4. Click 'Set the time and date'
5. Choose tab 'Internet Time'

My computer says:
"This computer is set to automatically synchronize with 'time.windows.com"
and
"This computer is set to automatically synchronize on a scheduled basis."

If I click 'Change settings' there's only an option to check the box 'Synchronize with an Internet time server' and set time server to time.windows.com or time.nist.gov. There's also an option to click 'Update now'.

I don't know how often this synchronization is done on the background on Windows 10.

On Windows 7 that setting can be found the same way.
Windows 7 info says: "Your clock is typically updated once a week and needs to be connected to the Internet for the synchronization to occur."

I had one LHCb task running and it errored out during the day light time change.
ID: 37118 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37119 - Posted: 29 Oct 2018, 1:01:55 UTC - in response to Message 37118.  

@ Richie_unstable
Thanks for that.

@ Magic
Thanks for wiki article. It clears some confusion. I assumed clocks in N. America changed this weekend along European clocks. Now I see ours don't change until November 4.

OK. Here's the wager. I bet 1 barrel of Kokanee (the finest beer on the planet) that when our clocks change on November 4 my hosts don't fail any tasks with the heartbeat related error or any error other than errors attributable to project infrastructure failure. Any takers? Here's your opportunity, computezrme.
ID: 37119 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,738,042
RAC: 122,037
Message 37121 - Posted: 29 Oct 2018, 6:09:12 UTC - in response to Message 37105.  

heartbeat
this hearbeat thing is a big headache anyway.
Some time ago, someone here gave a thorough technical explanation how it works. And so it became clear to me why once and so often my VM tasks fail on the two notebooks which are connected via WLAN, once the WLAN connection gets interrupted for a second or two.
As a consequence, always and again this results in VM tasks which have run for many hours and are almost finished, and suddenly stop due to "hearbeat missing".
Damned thing.
ID: 37121 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 37123 - Posted: 29 Oct 2018, 10:57:28 UTC - in response to Message 37104.  

Have voted against this summertime-arrangement. More than 4 Mio. EU-Citizens have voted.
Hoping this is for next year history.
Yes, Atlas had no problems.

This WU was starting at the Window of DLS change and worked with 3 DAYS CPU
and got a confirmation Error:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=102386219
RDP was showing a successful work of 50 Collisions every CPU. Task had 4 CPU's working.
So, there must be something other going wrong.
ID: 37123 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,738,042
RAC: 122,037
Message 37124 - Posted: 29 Oct 2018, 11:37:36 UTC - in response to Message 37123.  

So, there must be something other going wrong.
I would say as long as the LHC people don't tell us what the problem really was, we can only guess.

Further, I don't remember having had this problem in the years before, at the dates of time change. Maybe the technical configuration of the tasks was different then, who knows.
ID: 37124 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,636,002
RAC: 16,005
Message 37132 - Posted: 29 Oct 2018, 18:56:29 UTC - in response to Message 37118.  

On Windows 10 (17134.376) Internet time setting can be found here:

1. Right-click clock on the Taskbar
2. Click 'Adjust date/time'
3. Click 'Additional date, time & regional settings'
4. Click 'Set the time and date'
5. Choose tab 'Internet Time'

My computer says:
"This computer is set to automatically synchronize with 'time.windows.com"
and
"This computer is set to automatically synchronize on a scheduled basis."

If I click 'Change settings' there's only an option to check the box 'Synchronize with an Internet time server' and set time server to time.windows.com or time.nist.gov. There's also an option to click 'Update now'.

I don't know how often this synchronization is done on the background on Windows 10.

On Windows 7 that setting can be found the same way.
Windows 7 info says: "Your clock is typically updated once a week and needs to be connected to the Internet for the synchronization to occur."

I had one LHCb task running and it errored out during the day light time change.

That's the way I have both my win7 and win10 computers set, to update time from internet. But this setting doesn't mean that the time would not change when DSL starts or ends. But If you select to change your time zone there you can find setting whether the computer should follow DSL automatically or not. Anyway I prefer the computer to show the actual time.
ID: 37132 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37133 - Posted: 29 Oct 2018, 22:08:43 UTC - in response to Message 37132.  

But this setting doesn't mean that the time would not change when DSL starts or ends.

True. And now I see that the explanation I proposed earlier is incorrect. That explanation was based on a discussion I read years ago on Stack Overflow regarding why the system clock gets messed up on dual-boot (Windows <-> Linux) systems when switching between OS's. I recalled the facts incorrectly and so came up with a partially incorrect explanation for why your tasks failed.

The part that is wrong is where I claimed Linux is immune to the problem. I won't be surprised if my Linux hosts lose tasks when the time changes here in N. America on Nov. 4. and I am forced to give up a barrel of beer :-(
ID: 37133 · Report as offensive     Reply Quote

Message boards : Theory Application : Day Light Saving time ended and all Theory tasks that were running got aborted by client


©2024 CERN