Message boards : ATLAS application : Task stops at 3 hours and hibernates
Message board moderation

To post messages, you must log in.

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 4,865,275
RAC: 87
Message 50386 - Posted: 10 Jun 2024, 22:38:21 UTC
Last modified: 10 Jun 2024, 22:38:56 UTC

Whats going on now?
I set my change between tasks for 8 hours to cover ATLAS 6+ hours run time.
I came to check on things and watched my current task go into to pause, with nothing to do with me.

I grabbed some line from the start and end from the Vbox.log tab in the VM:
0:00:35.688252 VMMDev: Guest Log: *** Starting ATLAS job. (PandaID=6225421949 taskID=38921007) ***
00:10:08.970575 Display::i_handleDisplayResize: uScreenId=0 pvVRAM=00000248d94e0000 w=800 h=600 bpp=0 cbLine=0xC80 flags=0x5 origin=0,0
00:18:33.716129 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago
00:18:34.132685 VMMDev: GuestHeartBeat: Guest is alive (gone 4 417 029 721 ns)
00:20:43.292363 TM: Giving up catch-up attempt at a 60 000 310 325 ns lag; new total: 60 000 310 325 ns
00:22:27.160213 TM: Giving up catch-up attempt at a 60 001 117 696 ns lag; new total: 120 001 428 021 ns
00:25:17.664849 TM: Giving up catch-up attempt at a 60 000 051 304 ns lag; new total: 180 001 479 325 ns
00:29:09.053594 TM: Giving up catch-up attempt at a 60 000 021 982 ns lag; new total: 240 001 501 307 ns
00:33:00.568812 TM: Giving up catch-up attempt at a 60 000 377 087 ns lag; new total: 300 001 878 394 ns
00:36:55.876067 TM: Giving up catch-up attempt at a 60 000 099 862 ns lag; new total: 360 001 978 256 ns
00:40:54.343213 TM: Giving up catch-up attempt at a 60 000 137 235 ns lag; new total: 420 002 115 491 ns
00:44:52.521410 TM: Giving up catch-up attempt at a 60 000 051 240 ns lag; new total: 480 002 166 731 ns
00:48:42.127924 TM: Giving up catch-up attempt at a 60 000 045 005 ns lag; new total: 540 002 211 736 ns
00:52:27.709940 TM: Giving up catch-up attempt at a 60 000 068 108 ns lag; new total: 600 002 279 844 ns
00:56:13.937077 TM: Giving up catch-up attempt at a 60 000 228 106 ns lag; new total: 660 002 507 950 ns
01:00:06.595391 TM: Giving up catch-up attempt at a 60 000 092 409 ns lag; new total: 720 002 600 359 ns
01:04:00.453791 TM: Giving up catch-up attempt at a 60 000 320 087 ns lag; new total: 780 002 920 446 ns
01:07:47.644990 TM: Giving up catch-up attempt at a 60 000 035 849 ns lag; new total: 840 002 956 295 ns
01:11:42.157205 TM: Giving up catch-up attempt at a 60 000 325 195 ns lag; new total: 900 003 281 490 ns
01:15:31.381817 TM: Giving up catch-up attempt at a 60 000 114 222 ns lag; new total: 960 003 395 712 ns
01:19:21.428748 TM: Giving up catch-up attempt at a 60 000 186 713 ns lag; new total: 1 020 003 582 425 ns
01:23:12.729553 TM: Giving up catch-up attempt at a 60 000 393 694 ns lag; new total: 1 080 003 976 119 ns
01:27:05.529949 TM: Giving up catch-up attempt at a 60 000 372 597 ns lag; new total: 1 140 004 348 716 ns
01:31:00.503464 TM: Giving up catch-up attempt at a 60 000 188 989 ns lag; new total: 1 200 004 537 705 ns
01:34:51.639890 TM: Giving up catch-up attempt at a 60 000 079 972 ns lag; new total: 1 260 004 617 677 ns
01:38:41.261596 TM: Giving up catch-up attempt at a 60 000 111 147 ns lag; new total: 1 320 004 728 824 ns
01:42:35.334766 TM: Giving up catch-up attempt at a 60 000 119 826 ns lag; new total: 1 380 004 848 650 ns
01:46:28.093218 TM: Giving up catch-up attempt at a 60 000 360 348 ns lag; new total: 1 440 005 208 998 ns
01:50:21.360854 TM: Giving up catch-up attempt at a 60 000 347 506 ns lag; new total: 1 500 005 556 504 ns
01:54:10.581630 TM: Giving up catch-up attempt at a 60 000 001 383 ns lag; new total: 1 560 005 557 887 ns
01:58:01.499575 TM: Giving up catch-up attempt at a 60 001 763 385 ns lag; new total: 1 620 007 321 272 ns
02:01:49.956978 TM: Giving up catch-up attempt at a 60 000 055 005 ns lag; new total: 1 680 007 376 277 ns
02:05:40.186213 TM: Giving up catch-up attempt at a 60 000 157 699 ns lag; new total: 1 740 007 533 976 ns
02:09:33.665734 TM: Giving up catch-up attempt at a 60 000 007 206 ns lag; new total: 1 800 007 541 182 ns
02:13:25.396855 TM: Giving up catch-up attempt at a 60 000 275 852 ns lag; new total: 1 860 007 817 034 ns
02:17:18.078033 TM: Giving up catch-up attempt at a 60 000 161 503 ns lag; new total: 1 920 007 978 537 ns
02:21:08.048514 TM: Giving up catch-up attempt at a 60 000 129 763 ns lag; new total: 1 980 008 108 300 ns
02:24:58.407331 TM: Giving up catch-up attempt at a 60 000 219 960 ns lag; new total: 2 040 008 328 260 ns
02:28:49.606826 TM: Giving up catch-up attempt at a 60 000 249 691 ns lag; new total: 2 100 008 577 951 ns
02:32:40.930542 TM: Giving up catch-up attempt at a 60 000 326 541 ns lag; new total: 2 160 008 904 492 ns
02:36:28.546479 TM: Giving up catch-up attempt at a 60 003 713 843 ns lag; new total: 2 220 012 618 335 ns
02:40:18.233923 TM: Giving up catch-up attempt at a 60 000 006 544 ns lag; new total: 2 280 012 624 879 ns
02:44:09.004363 TM: Giving up catch-up attempt at a 60 000 087 193 ns lag; new total: 2 340 012 712 072 ns
02:47:56.779220 TM: Giving up catch-up attempt at a 60 000 036 922 ns lag; new total: 2 400 012 748 994 ns
02:51:39.146567 TM: Giving up catch-up attempt at a 60 000 162 664 ns lag; new total: 2 460 012 911 658 ns
02:55:24.012988 TM: Giving up catch-up attempt at a 60 000 042 729 ns lag; new total: 2 520 012 954 387 ns
02:57:57.527759 Changing the VM state from 'RUNNING' to 'SUSPENDING'
02:57:57.537393 AIOMgr: Endpoint for file 'D:\data\slots\21\boinc_11c65b7ed1529441\Snapshots\{024d2459-b86f-4377-9de4-8371de5c028a}.vdi' (flags 000c0781) created successfully
02:57:57.540530 PDMR3Suspend: 12 722 010 ns run time
02:57:57.540551 Changing the VM state from 'SUSPENDING' to 'SUSPENDED'
02:57:57.540564 Console: Machine state changed to 'Paused'
02:57:57.541050 Console: Machine state changed to 'Saving'
02:57:57.541629 Changing the VM state from 'SUSPENDED' to 'SAVING'

02:58:13.461524 GIM: KVM: Resetting MSRs
02:58:13.465094 vmmR3LogFlusher: Terminating (VERR_OBJECT_DESTROYED)
02:58:13.465191 Changing the VM state from 'DESTROYING' to 'TERMINATED'
02:58:13.466846 Console: Machine state changed to 'Saved'
02:58:13.466979 VBoxHeadless: processEventQueue: VERR_INTERRUPTED, termination requested
02:58:14.173212 End of log file - Log started 2024-06-10T19:31:36.962548400Z

In the past such events like this lead to the error I described in the other thread. Again, I did nothing to this task to make it do this. So I don't understand why it stopped when it has free run for 8 hrs.

Current time as you see here is 2:58:10 with 3:02:42 left aprox. 49.37% finished.
ID: 50386 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,449,651
RAC: 2,546
Message 50417 - Posted: 17 Jun 2024, 23:14:04 UTC
Last modified: 17 Jun 2024, 23:16:37 UTC

That may be the power method you use in Windows 10, hibernation is an option within the power method and you can change it.
Configuration/system
Control panel/Power and Sleep/Additional power settings/Choose or customize a power plan/(choose 3 options) Eco/balanced/High performance..Change plan settings/put the computer into sleep state (never) / change advanced power settings / Sleep / Sleep after (never) / hybrid sleep (never) / hibernate (never)
ID: 50417 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,449,651
RAC: 2,546
Message 50421 - Posted: 18 Jun 2024, 22:56:02 UTC - in response to Message 50417.  

That may be the power method you use in Windows 10, hibernation is an option within the power method and you can change it.
Configuration/system
Control panel/Power and Sleep/Additional power settings/Choose or customize a power plan/(choose 3 options) Eco/balanced/High performance..Change plan settings/put the computer into sleep state (never) / change advanced power settings / Sleep / Sleep after (never) / hybrid sleep (never) / hibernate (never)



Choose high performance

Within sleep there are more options for hard drives, graphics card, ethernet, wifi, usb.... check them all and verify what they say (never or disabled), there are also disconnections of cpu cores if there are no c-state tasks. It depends on you
ID: 50421 · Report as offensive     Reply Quote

Message boards : ATLAS application : Task stops at 3 hours and hibernates


©2024 CERN