Message boards :
ATLAS application :
Task stops at 3 hours and hibernates
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 334 Credit: 4,835,205 RAC: 1,680 |
Whats going on now? I set my change between tasks for 8 hours to cover ATLAS 6+ hours run time. I came to check on things and watched my current task go into to pause, with nothing to do with me. I grabbed some line from the start and end from the Vbox.log tab in the VM: 0:00:35.688252 VMMDev: Guest Log: *** Starting ATLAS job. (PandaID=6225421949 taskID=38921007) *** 00:10:08.970575 Display::i_handleDisplayResize: uScreenId=0 pvVRAM=00000248d94e0000 w=800 h=600 bpp=0 cbLine=0xC80 flags=0x5 origin=0,0 00:18:33.716129 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago 00:18:34.132685 VMMDev: GuestHeartBeat: Guest is alive (gone 4 417 029 721 ns) 00:20:43.292363 TM: Giving up catch-up attempt at a 60 000 310 325 ns lag; new total: 60 000 310 325 ns 00:22:27.160213 TM: Giving up catch-up attempt at a 60 001 117 696 ns lag; new total: 120 001 428 021 ns 00:25:17.664849 TM: Giving up catch-up attempt at a 60 000 051 304 ns lag; new total: 180 001 479 325 ns 00:29:09.053594 TM: Giving up catch-up attempt at a 60 000 021 982 ns lag; new total: 240 001 501 307 ns 00:33:00.568812 TM: Giving up catch-up attempt at a 60 000 377 087 ns lag; new total: 300 001 878 394 ns 00:36:55.876067 TM: Giving up catch-up attempt at a 60 000 099 862 ns lag; new total: 360 001 978 256 ns 00:40:54.343213 TM: Giving up catch-up attempt at a 60 000 137 235 ns lag; new total: 420 002 115 491 ns 00:44:52.521410 TM: Giving up catch-up attempt at a 60 000 051 240 ns lag; new total: 480 002 166 731 ns 00:48:42.127924 TM: Giving up catch-up attempt at a 60 000 045 005 ns lag; new total: 540 002 211 736 ns 00:52:27.709940 TM: Giving up catch-up attempt at a 60 000 068 108 ns lag; new total: 600 002 279 844 ns 00:56:13.937077 TM: Giving up catch-up attempt at a 60 000 228 106 ns lag; new total: 660 002 507 950 ns 01:00:06.595391 TM: Giving up catch-up attempt at a 60 000 092 409 ns lag; new total: 720 002 600 359 ns 01:04:00.453791 TM: Giving up catch-up attempt at a 60 000 320 087 ns lag; new total: 780 002 920 446 ns 01:07:47.644990 TM: Giving up catch-up attempt at a 60 000 035 849 ns lag; new total: 840 002 956 295 ns 01:11:42.157205 TM: Giving up catch-up attempt at a 60 000 325 195 ns lag; new total: 900 003 281 490 ns 01:15:31.381817 TM: Giving up catch-up attempt at a 60 000 114 222 ns lag; new total: 960 003 395 712 ns 01:19:21.428748 TM: Giving up catch-up attempt at a 60 000 186 713 ns lag; new total: 1 020 003 582 425 ns 01:23:12.729553 TM: Giving up catch-up attempt at a 60 000 393 694 ns lag; new total: 1 080 003 976 119 ns 01:27:05.529949 TM: Giving up catch-up attempt at a 60 000 372 597 ns lag; new total: 1 140 004 348 716 ns 01:31:00.503464 TM: Giving up catch-up attempt at a 60 000 188 989 ns lag; new total: 1 200 004 537 705 ns 01:34:51.639890 TM: Giving up catch-up attempt at a 60 000 079 972 ns lag; new total: 1 260 004 617 677 ns 01:38:41.261596 TM: Giving up catch-up attempt at a 60 000 111 147 ns lag; new total: 1 320 004 728 824 ns 01:42:35.334766 TM: Giving up catch-up attempt at a 60 000 119 826 ns lag; new total: 1 380 004 848 650 ns 01:46:28.093218 TM: Giving up catch-up attempt at a 60 000 360 348 ns lag; new total: 1 440 005 208 998 ns 01:50:21.360854 TM: Giving up catch-up attempt at a 60 000 347 506 ns lag; new total: 1 500 005 556 504 ns 01:54:10.581630 TM: Giving up catch-up attempt at a 60 000 001 383 ns lag; new total: 1 560 005 557 887 ns 01:58:01.499575 TM: Giving up catch-up attempt at a 60 001 763 385 ns lag; new total: 1 620 007 321 272 ns 02:01:49.956978 TM: Giving up catch-up attempt at a 60 000 055 005 ns lag; new total: 1 680 007 376 277 ns 02:05:40.186213 TM: Giving up catch-up attempt at a 60 000 157 699 ns lag; new total: 1 740 007 533 976 ns 02:09:33.665734 TM: Giving up catch-up attempt at a 60 000 007 206 ns lag; new total: 1 800 007 541 182 ns 02:13:25.396855 TM: Giving up catch-up attempt at a 60 000 275 852 ns lag; new total: 1 860 007 817 034 ns 02:17:18.078033 TM: Giving up catch-up attempt at a 60 000 161 503 ns lag; new total: 1 920 007 978 537 ns 02:21:08.048514 TM: Giving up catch-up attempt at a 60 000 129 763 ns lag; new total: 1 980 008 108 300 ns 02:24:58.407331 TM: Giving up catch-up attempt at a 60 000 219 960 ns lag; new total: 2 040 008 328 260 ns 02:28:49.606826 TM: Giving up catch-up attempt at a 60 000 249 691 ns lag; new total: 2 100 008 577 951 ns 02:32:40.930542 TM: Giving up catch-up attempt at a 60 000 326 541 ns lag; new total: 2 160 008 904 492 ns 02:36:28.546479 TM: Giving up catch-up attempt at a 60 003 713 843 ns lag; new total: 2 220 012 618 335 ns 02:40:18.233923 TM: Giving up catch-up attempt at a 60 000 006 544 ns lag; new total: 2 280 012 624 879 ns 02:44:09.004363 TM: Giving up catch-up attempt at a 60 000 087 193 ns lag; new total: 2 340 012 712 072 ns 02:47:56.779220 TM: Giving up catch-up attempt at a 60 000 036 922 ns lag; new total: 2 400 012 748 994 ns 02:51:39.146567 TM: Giving up catch-up attempt at a 60 000 162 664 ns lag; new total: 2 460 012 911 658 ns 02:55:24.012988 TM: Giving up catch-up attempt at a 60 000 042 729 ns lag; new total: 2 520 012 954 387 ns 02:57:57.527759 Changing the VM state from 'RUNNING' to 'SUSPENDING' 02:57:57.537393 AIOMgr: Endpoint for file 'D:\data\slots\21\boinc_11c65b7ed1529441\Snapshots\{024d2459-b86f-4377-9de4-8371de5c028a}.vdi' (flags 000c0781) created successfully 02:57:57.540530 PDMR3Suspend: 12 722 010 ns run time 02:57:57.540551 Changing the VM state from 'SUSPENDING' to 'SUSPENDED' 02:57:57.540564 Console: Machine state changed to 'Paused' 02:57:57.541050 Console: Machine state changed to 'Saving' 02:57:57.541629 Changing the VM state from 'SUSPENDED' to 'SAVING' 02:58:13.461524 GIM: KVM: Resetting MSRs 02:58:13.465094 vmmR3LogFlusher: Terminating (VERR_OBJECT_DESTROYED) 02:58:13.465191 Changing the VM state from 'DESTROYING' to 'TERMINATED' 02:58:13.466846 Console: Machine state changed to 'Saved' 02:58:13.466979 VBoxHeadless: processEventQueue: VERR_INTERRUPTED, termination requested 02:58:14.173212 End of log file - Log started 2024-06-10T19:31:36.962548400Z In the past such events like this lead to the error I described in the other thread. Again, I did nothing to this task to make it do this. So I don't understand why it stopped when it has free run for 8 hrs. Current time as you see here is 2:58:10 with 3:02:42 left aprox. 49.37% finished. |
Send message Joined: 9 Feb 09 Posts: 25 Credit: 2,284,388 RAC: 6,561 |
That may be the power method you use in Windows 10, hibernation is an option within the power method and you can change it. Configuration/system Control panel/Power and Sleep/Additional power settings/Choose or customize a power plan/(choose 3 options) Eco/balanced/High performance..Change plan settings/put the computer into sleep state (never) / change advanced power settings / Sleep / Sleep after (never) / hybrid sleep (never) / hibernate (never) |
Send message Joined: 9 Feb 09 Posts: 25 Credit: 2,284,388 RAC: 6,561 |
That may be the power method you use in Windows 10, hibernation is an option within the power method and you can change it. Choose high performance Within sleep there are more options for hard drives, graphics card, ethernet, wifi, usb.... check them all and verify what they say (never or disabled), there are also disconnections of cpu cores if there are no c-state tasks. It depends on you |
©2024 CERN