Message boards : ATLAS application : Vbox headless crash - VM enviorment needs to be cleaned error
Message board moderation

To post messages, you must log in.

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,075,204
RAC: 2,379
Message 50362 - Posted: 8 Jun 2024, 16:17:43 UTC

First of all, what does this error mean?
I clean up the VM environment and then reset the project, because the stuck task can not report and will not run ,unless I am supposed to just wait it out and let BOINC figure out when to run it,

But also, is this a resource error or a project task error? I have been running two tasks on 4 cores and 99% of the time that works.
For now I pulled back to 1 task - 4 cores.

A previous task had this same kind of issue: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411540199
Maybe that can help you tell me what is going on.
ID: 50362 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1444
Credit: 9,704,984
RAC: 918
Message 50363 - Posted: 8 Jun 2024, 16:47:11 UTC - in response to Message 50362.  

Both error tasks restarted the Virtual Machine 6 or 7 times. ATLAS (and CMS) doesn't like interruptions. To run these tasks it's best to run them in one flow.

If an interruption is needed (only for a short period), you should be sure that a VM-task is suspended (Boinc's LAIM (leave app in memory) is off) and the VM is properly saved to disk.
ID: 50363 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,075,204
RAC: 2,379
Message 50364 - Posted: 8 Jun 2024, 17:13:59 UTC
Last modified: 8 Jun 2024, 17:18:03 UTC

hmm...memory is off.
However I run multiple projects. So I guess I have to extend the switch between time to accommodate ATLAS because I guess BOINC wants to take those 4 cores for other projects.
ID: 50364 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,075,204
RAC: 2,379
Message 50384 - Posted: 10 Jun 2024, 19:38:08 UTC
Last modified: 10 Jun 2024, 19:39:17 UTC

Crystal,

I had a few more crash like this, but what was odd is the latest one doesn't show up in my tasks list here.

The other one ran for 3 hours something and locked up and ran 19 hrs clock time and froze.




Some strange status settings. Not the normal suspend. But you can't see that in the STDERR output.
That as something I saw in the VM log.
What I see in my latest error task is the the VM never turned on.

Is there some sort of problem with the new Vbox?
I just upgraded to the latest and this is when the errors started popping up.


Also, I suppose there is commands to use to make ATLAS run longer than all the other projects?
I just have to set my run time to longer?

The task I am talking about:https://lhcathome.cern.ch/lhcathome/result.php?resultid=411732711
ID: 50384 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1444
Credit: 9,704,984
RAC: 918
Message 50387 - Posted: 11 Jun 2024, 7:47:31 UTC - in response to Message 50384.  

Is there some sort of problem with the new Vbox?
I just upgraded to the latest and this is when the errors started popping up.
I'm using VBox 7.0.18 too and no issues.

Also, I suppose there is commands to use to make ATLAS run longer than all the other projects?
I just have to set my run time to longer?
To avoid switching between BOINC tasks (different projects), you may set "Switch between tasks evey .... minutes" to 14400 (ten days)
in BOINC's Computing preferences and also tick "Leave non-GPU tasks in memory while suspended".
ID: 50387 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,075,204
RAC: 2,379
Message 50391 - Posted: 11 Jun 2024, 18:02:13 UTC - in response to Message 50387.  

Is there some sort of problem with the new Vbox?
I just upgraded to the latest and this is when the errors started popping up.
I'm using VBox 7.0.18 too and no issues.

Also, I suppose there is commands to use to make ATLAS run longer than all the other projects?
I just have to set my run time to longer?
To avoid switching between BOINC tasks (different projects), you may set "Switch between tasks evey .... minutes" to 14400 (ten days)
in BOINC's Computing preferences and also tick "Leave non-GPU tasks in memory while suspended".



I think thats my error: Leave non-GPU tasks in memory while suspended.

thanks for the reminder
ID: 50391 · Report as offensive     Reply Quote

Message boards : ATLAS application : Vbox headless crash - VM enviorment needs to be cleaned error


©2025 CERN