Questions and Answers : Windows : reboot caused by LHC? cause TBD + exotic problems
Message board moderation

To post messages, you must log in.

AuthorMessage
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43556 - Posted: 2 Nov 2020, 15:11:23 UTC

hi,

So I have finally done of my side project host which a ryzen 7 3700X with 32GB and windows. Sicne I was needing virtualbox for the side projects on this host, I wanted to re-instated my ressources giveaway to your project.
But since I have activated this boinc, I have sudden reboots of this host.
It's not happening while I'm using it strangely, only when I'm off of it. So I'm still not sure of the causes.

What I can say is that I only receive CMS project while in my preferences every project are selected (so I don't really understand.
Also that there is apparently a problem with VBoxNetLwf driver at each startup at least. So it could be also one of my other VM that could cause a crash because of this driver.
The one other project from boinc is GPUGRID which could maybe cause a crash on my graphic card maybe that's also a possibility.
What I can say is that some jobs are finishing, others are not and ends up with an error. I didn't compare the time with the crashes yet.

What I'm going to do first is to see if there is not a corruption in my preferences file in your database by deactivating the other projects like LHC or Atlas and reactivating them to see if there is a difference.

virtualbox 6.1.16
windows 1909 pro
ID: 43556 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,989,970
RAC: 136,507
Message 43557 - Posted: 2 Nov 2020, 15:39:41 UTC - in response to Message 43556.  

Regarding some recent error messages of this kind:
207 (0x000000CF) EXIT_NO_SUB_TASKS
This is not your fault.
CMS tasks are "envelopes" starting subtasks inside a virtualbox VM.
The queue distributing the subtasks occasionally runs dry and then you get that error.


Regarding your crashes:
Your logfile shows this
2020-11-01 21:53:15 (32760): Setting CPU throttle for VM. (65%)

This might cause timing problems.
Try to set CPU throttle to 100%.
ID: 43557 · Report as offensive     Reply Quote
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43558 - Posted: 2 Nov 2020, 18:11:32 UTC - in response to Message 43557.  
Last modified: 2 Nov 2020, 18:12:54 UTC

Regarding some recent error messages of this kind:
207 (0x000000CF) EXIT_NO_SUB_TASKS
This is not your fault.
CMS tasks are "envelopes" starting subtasks inside a virtualbox VM.
The queue distributing the subtasks occasionally runs dry and then you get that error.


Regarding your crashes:
Your logfile shows this
2020-11-01 21:53:15 (32760): Setting CPU throttle for VM. (65%)

This might cause timing problems.
Try to set CPU throttle to 100%.


Can't do what you are advising on the long term. But I will maybe try it in the following days when I have a little time to spare to observe what's happening.

I don't know what is your compute load obviously but mine is pretty heavy and I tend to let hosts running and I prefer to let them run as long as they can without rebooting for several practical reasons.
And as I'm sure you know we are not living in a real multi-threading world but more of a space/time management. And as you can see I've been contributing to boinc projects for over a decade now, and I've observed that over time if I don't give time to recover to hardware components to actually process other things than boinc projects, system are running slower and slower. 65% for me is a good compromise about the CPU time. Of course this is managed way better under linux so the problem isn't the same there. Plus as you can maybe see in my preferences, I've put the cpu demand way low so anyway the VM has to stop many times which means the same problem that occurs with the 65%

But I get your point of view but as I said, I don't have crashes while I'm using this host. So I'm actually not sure if it's not the inherent problem of virtualbox itself.
What I can affirm is that it's only linked to virtualbox/ and LHC because I've run some tests and it doesn't happen with other projects running, but of course those other projects are running virtualizations.

I'm going to see if it's not my whonix gateway VM that cause problems after all since there is this driver problem at boot and which could maybe lead to BSOD(I actually don't know what happen when it crashes since it's a remote host)
ID: 43558 · Report as offensive     Reply Quote
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43563 - Posted: 3 Nov 2020, 15:20:51 UTC - in response to Message 43558.  

So, I've test a bit more.
I don't think it's LHC own problem but more a virtualbox in-house problem with memory addressing or something alike.
I've re-update the drivers with the last AMD platform available etc. uninstalled / reboot / reinstalled virtualbox as suggested on the virtualbox forum in various cases
and now the situation has evolved a little bit.
First because I witnessed one of the crashes since I was close in space to this remote host at that moment. It's not BSOD apparently it's just a crash plain and simple. Everything shuts down and the host restarts on its own. Remarks: the drive continues to work a few seconds after the screen goes black which suggests it's something progressive.
Secondly, now I have another problem since all VM stopped working after a few seconds and virtualbox succeeds to give me an error message of memory adressing like this https://share.riseup.net/#PxiN93tK4an6L004tijKZg

I don't have any certainty of course but I hope that's the same problem as before and that specific problem suggests to be a platform-specific problem.
ID: 43563 · Report as offensive     Reply Quote
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43585 - Posted: 8 Nov 2020, 13:05:00 UTC

there was apparently 2 problems.
The first one was related to my graphic card most probably, I d'ont think it is dying since it can sustain stresstest but maybe there is an imcpatibility of chipset I don't know. Anyway since then I4ve changed it, and it seems to work okey without any crash ore reboot.
The only problem is now that I still have those error messages and this comes directly from lhc@home VM
So what does it mean?
VBoxManage.exe: error: The VM session was aborted
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component SessionMachine, interface ISession


The only think I can think off is that I've installed docker but in WSL but I don't see how it would interfere with a VM from virtualbox. So if anyone has any suggestions since it gives an error to all taks in LHC@Home apparently that would be great.
ID: 43585 · Report as offensive     Reply Quote
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43670 - Posted: 22 Nov 2020, 17:22:05 UTC

okey so fortunately, it was probably because of the GPU, most probably a problem of incompatibility of the motherboard or the CPU.
Thanks for your help
ID: 43670 · Report as offensive     Reply Quote
vigilian

Send message
Joined: 23 Jul 05
Posts: 53
Credit: 2,707,793
RAC: 0
Message 43689 - Posted: 23 Nov 2020, 15:38:19 UTC

okey so I wasn't aware that virtual machine paltform was hyperV API also.
It should maybe be added to yeti thread because it's unclear at first. WSL is not associated with running a VM normally at least with all the IT people I know. And so it is not usually considered as an hypervisor.

So is it a way to make it work with CMS or is it a known bug from virtualbox? Because it was working before so I don't see why it wouldn't now.
ID: 43689 · Report as offensive     Reply Quote

Questions and Answers : Windows : reboot caused by LHC? cause TBD + exotic problems


©2024 CERN