Thread 'Failed tasks not cleaning up and exiting in reasonable time'

Author	Message
Glohr Send message Joined: 13 Jan 24 Posts: 48 Credit: 9,528,655 RAC: 18,214	Message 51915 - Posted: 28 May 2025, 16:48:25 UTC Today I found two VirtualBox ATLAS tasks in a sort of zombie state with stderr containing: 2025-05-26 22:19:39 (7140): Guest Log: [INFO] Probing /cvmfs/atlas.cern.ch... OK 2025-05-26 22:19:39 (7140): Guest Log: [INFO] Detected branch: prod 2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] Failed to copy ATLASJobWrapper-prod.sh 2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] VM early shutdown initiated due to previous errors. 2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] Cleanup will take a few minutes... 2025-05-26 23:46:54 (7140): Status Report: Elapsed Time: '6000.000000' 2025-05-26 23:46:54 (7140): Status Report: CPU Time: '31.187500' [...] 2025-05-28 07:44:48 (7140): Status Report: Elapsed Time: '114000.000000' 2025-05-28 07:44:48 (7140): Status Report: CPU Time: '343.546875' The other log is similar. Cleanup seems to have failed so I will abort both. https://lhcathome.cern.ch/lhcathome/result.php?resultid=422888662 https://lhcathome.cern.ch/lhcathome/result.php?resultid=422887835 Other ATLAS tasks have been completing successfully on the same system. Does anyone have an explanation for this behavior? ID: 51915 · Reply Quote