Message boards : CMS Application : Reminder: CMS@Home virtual machines do not like being shut down abruptly
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1165
Credit: 12,130,929
RAC: 9,235
Message 53529 - Posted: 5 May 2026, 15:30:26 UTC
Last modified: 5 May 2026, 15:31:07 UTC

Recently we have started to have an increasing number of a particular error, where jobs fail due to "failed waiting for external process".
One machine in particular has had a number of these failures, and looking at its task log it appears that its VM is frequently stopped and restarted some time later. I'd like to just remind the users that our VM doesn't like to be stopped, especially if it's done in a non-controlled way (if, for example, an external application is started which causes the BOINC manager to stop tasks to give up resources). For short stoppages it is recommended to use manager or boinccmd to pause the task, and wait for the VM to save its state before stopping BOINC.
ID: 53529 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1165
Credit: 12,130,929
RAC: 9,235
Message 53541 - Posted: 7 May 2026, 10:43:15 UTC - in response to Message 53529.  

I've just been doing some counting: Out of 210 jobs with this error, 182 were from the same host! That's 2.1% of all jobs in this workflow so far...
One user will be getting a PM very soon.
ID: 53541 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1324
Credit: 101,837,433
RAC: 149,280
Message 53555 - Posted: 7 May 2026, 22:51:17 UTC

I do hope when we come back to life here that they return to working as good as they were for the last month

(version 61.25 vbox64_mt_mcore_cms is working perfectly right now)
ID: 53555 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1165
Credit: 12,130,929
RAC: 9,235
Message 53611 - Posted: 11 May 2026, 14:24:15 UTC - in response to Message 53555.  

In reply to Magic Quantum Mechanic's message of 7 May 2026:
I do hope when we come back to life here that they return to working as good as they were for the last month

(version 61.25 vbox64_mt_mcore_cms is working perfectly right now)

I'm nor sure why we had the dip in running jobs last week. People saving energy because of the Iran crisis? Or shutting down machines because of a heat-wave? I know there was a power outage at FNAL recently, but I don't think they give us substantial resources.
ID: 53611 · Report as offensive     Reply Quote

Message boards : CMS Application : Reminder: CMS@Home virtual machines do not like being shut down abruptly


©2026 CERN