Message boards : Number crunching : VM Hypervisor
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32456 - Posted: 18 Sep 2017, 6:20:17 UTC
Last modified: 18 Sep 2017, 6:24:55 UTC

I have a job on this system, it is, and has been for at least the last 24 hours, in an unusual, to me, state. It's status is shown as...

Postponed: VM Hypervisor failed to enter an online state in a timely fashion.

... okay, so is this something that will, at some point, enter an online state, or is the job just hung?

The work unit is...

CMS_24868_1505498727.617335_0

... of application...

CMS Simulation 47.60 (vbox64)

... running here under Windows 8.1 x64. It has had 2:33:05 CPU and is showing 12.666% complete.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32456 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,224,547
RAC: 2,350
Message 32459 - Posted: 18 Sep 2017, 10:43:57 UTC - in response to Message 32456.  

2 options:

- wait 86400 seconds
- stop BOINC-client (not only the Manager) and restart BOINC
ID: 32459 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32462 - Posted: 18 Sep 2017, 13:23:46 UTC
Last modified: 18 Sep 2017, 14:17:44 UTC

I first noticed this yesterday morning, so it has already been at least 24 hours since whatever happened, happened. It was quiet, so I rebooted the system, which would, of course, stop and start BOINC. Upon restarting, the job started running again, no new progress was shown, it is still 12.666%, and after about 10 minutes, it went back into that strange state, where it is currently sitting. The elapsed dropped back a bit from the checkpoint load, but it has advanced until 2:33:06, almost identical to the time before where this happened. "Fixes" have involved some chmod and chown commands, but I am not running Linux.

It is a quorum and replication of 1 so I cannot see if a wingman has got anywhere with it.

I've not seen that status before.

<edit>
A net search has shown this same status showing up at other projects, Cosmology and RNA for example. Fixes have suggested updating VirtualBox, (to 5.1.28), which I, and others have tried, but it has not helped yet.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32462 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32463 - Posted: 18 Sep 2017, 14:40:02 UTC - in response to Message 32462.  
Last modified: 18 Sep 2017, 14:46:05 UTC

... but it has. I was fiddling with this to see what or where the problem was, which inevitably had me stop and start the system a few times. Now, I can see that the job IS running again, and has advanced to 13.207%, so perhaps the VirtualBox update, followed by a restart, is a fix.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32463 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,224,547
RAC: 2,350
Message 32464 - Posted: 18 Sep 2017, 14:52:47 UTC - in response to Message 32462.  

I've seen that on most of the fora of BOINC-projects using VirtualBox.

IMO the problem is caused due to busy systems and the difference between the process priority of VBoxSVC.exe and BOINC's vboxwrapper.

VBoxSVC.exe has the priority 'normal' -> user priority and
vboxwrapper has the lowest priority.

On my systems both processes get both the priority 'below normal' and never see this problem.
ID: 32464 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32465 - Posted: 18 Sep 2017, 16:52:52 UTC

It is the first time I have seen it, but looking around the projects, it has been seen by others at other projects. The updated VirtualBox APPEARS to have fixed it for me, the task that was stopped is running and currently is showing 25.402% done.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32465 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 477
Credit: 3,294,736
RAC: 7,127
Message 32466 - Posted: 18 Sep 2017, 17:00:22 UTC - in response to Message 32465.  

My Windows box at work has always been like that, except with the oldest versions of VBox. Maybe it's time to try the latest again.
ID: 32466 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32468 - Posted: 18 Sep 2017, 17:53:05 UTC

The job that was stuck has continued to run here after the fix and is now 30.444% complete. It APPEARS to have corrected some issue but I wouldn't guarantee it. Try it again with an updated VirtualBox, but keep an eye on it. Perhaps an idea to keep hold of the older version installer so you can go back if necessary. I have not updated my other machine here yet, but then, I have not seen the problem on it, machine is almost identical to this one though, hardware and software.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32468 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 170
Credit: 648,569
RAC: 1,393
Message 32471 - Posted: 19 Sep 2017, 6:31:33 UTC - in response to Message 32468.  

The job has completed and uploaded overnight, is flagged as valid and been credited.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

ID: 32471 · Report as offensive     Reply Quote

Message boards : Number crunching : VM Hypervisor


©2018 CERN