Message boards : Number crunching : VM environment needed to be cleaned up.
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1001
Credit: 6,266,460
RAC: 113
Message 46512 - Posted: 22 Mar 2022, 15:43:44 UTC

Hi; running two CMS tasks in a Windows 10 box. Unfortunately, it's a "managed" machine and I have very limited admin rights to it. Over the weekend, my organisation installed patches and the machine rebooted. I doubt that BOINC was shut down cleanly.
When I logged in on Monday, one VM restarted, got a CMS job, and closed the BOINC task when the job finished. The other got into some sort of state, with messages such as this:
    22/03/2022 15:07:35 | LHC@home | Task CMS_2534121_1647957079.423644_0 postponed for 86400 seconds: VM environment needed to be cleaned up.

I've tried several times to get the second task to run, including waiting for the running task to finish, closing BOINC and removing the BOINC VMs from the VBox manager before restarting. However, the second task always end up with the above message and shows as "Powered Off" in the VBox manager.
Any ideas as to what to try next? I'm considering detaching from LHC@Home and re-attaching after cleaning off the current CMS VM image.


ID: 46512 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,181,806
RAC: 123,896
Message 46514 - Posted: 22 Mar 2022, 16:27:29 UTC - in response to Message 46512.  

1. Set BOINC to NNT and suspend all not yet started tasks
2. Let BOINC finish all running tasks, then shut down BOINC
3. Remove all .../slots/x/ (but don't remove /slots/)
4. Open VirtualBox Manager (use the BOINC user account) and remove "suspect" VM entries
5. From the VirtualBox manager menu start the Media Manager and remove "suspect" vdi entries
6. Restart BOINC, resume suspended tasks (staggered!) and allow fresh work
ID: 46514 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1001
Credit: 6,266,460
RAC: 113
Message 46515 - Posted: 22 Mar 2022, 17:01:51 UTC - in response to Message 46514.  

Thanks! I'll try that tomorrow when my current task has finished.
ID: 46515 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1001
Credit: 6,266,460
RAC: 113
Message 46527 - Posted: 23 Mar 2022, 11:12:28 UTC - in response to Message 46514.  

OK, that seems to have fixed it. Thanks again.
ID: 46527 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,562,261
RAC: 129,138
Message 46547 - Posted: 29 Mar 2022, 10:02:12 UTC - in response to Message 46514.  
Last modified: 29 Mar 2022, 10:04:55 UTC

1. Set BOINC to NNT and suspend all not yet started tasks
2. Let BOINC finish all running tasks, then shut down BOINC
3. Remove all .../slots/x/ (but don't remove /slots/)
4. Open VirtualBox Manager (use the BOINC user account) and remove "suspect" VM entries
5. From the VirtualBox manager menu start the Media Manager and remove "suspect" vdi entries
6. Restart BOINC, resume suspended tasks (staggered!) and allow fresh work


Point two: Suspending all CMS-Tasks (ONE after the OTHER! with a difference of a few seconds and waiting if suspended in Virtualbox).
If they are all suspended. you can close Boinc, all went to saved.
After reboot (for example Windows updates) you can start them all in the same way (ONE after the OTHER! with a difference of a few seconds)
Eight CMS-Tasks without problems are running again!
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10593998
ID: 46547 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,181,806
RAC: 123,896
Message 46549 - Posted: 29 Mar 2022, 13:54:05 UTC - in response to Message 46547.  

The question here was what to do to cleanup the vbox environment.
Steps (1./2.) are the most reliable method to avoid dealing with a mix of clean and unclean slots in Step (3.).

Step (3.) removes the slots without mercy, clean (should be empty now) as well as unclean content.
If some tasks are just suspended they will immediately fail when BOINC restarts.
ID: 46549 · Report as offensive     Reply Quote

Message boards : Number crunching : VM environment needed to be cleaned up.


©2024 CERN