Thread 'reboot caused by LHC? cause TBD + exotic problems'

Author	Message
vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43556 - Posted: 2 Nov 2020, 15:11:23 UTC hi, So I have finally done of my side project host which a ryzen 7 3700X with 32GB and windows. Sicne I was needing virtualbox for the side projects on this host, I wanted to re-instated my ressources giveaway to your project. But since I have activated this boinc, I have sudden reboots of this host. It's not happening while I'm using it strangely, only when I'm off of it. So I'm still not sure of the causes. What I can say is that I only receive CMS project while in my preferences every project are selected (so I don't really understand. Also that there is apparently a problem with VBoxNetLwf driver at each startup at least. So it could be also one of my other VM that could cause a crash because of this driver. The one other project from boinc is GPUGRID which could maybe cause a crash on my graphic card maybe that's also a possibility. What I can say is that some jobs are finishing, others are not and ends up with an error. I didn't compare the time with the crashes yet. What I'm going to do first is to see if there is not a corruption in my preferences file in your database by deactivating the other projects like LHC or Atlas and reactivating them to see if there is a difference. virtualbox 6.1.16 windows 1909 pro ID: 43556 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 303,429,319 RAC: 101,640	Message 43557 - Posted: 2 Nov 2020, 15:39:41 UTC - in response to Message 43556. ing some recent error messages of this kind: 207 (0x000000CF) EXIT_NO_SUB_TASKS This is not your fault. CMS tasks are "envelopes" starting subtasks inside a virtualbox VM. The queue distributing the subtasks occasionally runs dry and then you get that error. Regarding your crashes: Your logfile shows this [pre]2020-11-01 21:53:15 (32760): Setting CPU throttle for VM. (65%)[/pre] This might cause timing problems. Try to set CPU throttle to 100%. ID: 43557 · Reply Quote

vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43558 - Posted: 2 Nov 2020, 18:11:32 UTC - in response to Message 43557. Last modified: 2 Nov 2020, 18:12:54 UTC ]Regarding some recent error messages of this kind: 207 (0x000000CF) EXIT_NO_SUB_TASKS This is not your fault. CMS tasks are "envelopes" starting subtasks inside a virtualbox VM. The queue distributing the subtasks occasionally runs dry and then you get that error. Regarding your crashes: Your logfile shows this [pre]2020-11-01 21:53:15 (32760): Setting CPU throttle for VM. (65%)[/pre] This might cause timing problems. Try to set CPU throttle to 100%.[/quote] Can't do what you are advising on the long term. But I will maybe try it in the following days when I have a little time to spare to observe what's happening. I don't know what is your compute load obviously but mine is pretty heavy and I tend to let hosts running and I prefer to let them run as long as they can without rebooting for several practical reasons. And as I'm sure you know we are not living in a real multi-threading world but more of a space/time management. And as you can see I've been contributing to boinc projects for over a decade now, and I've observed that over time if I don't give time to recover to hardware components to actually process other things than boinc projects, system are running slower and slower. 65% for me is a good compromise about the CPU time. Of course this is managed way better under linux so the problem isn't the same there. Plus as you can maybe see in my preferences, I've put the cpu demand way low so anyway the VM has to stop many times which means the same problem that occurs with the 65% But I get your point of view but as I said, I don't have crashes while I'm using this host. So I'm actually not sure if it's not the inherent problem of virtualbox itself. What I can affirm is that it's only linked to virtualbox/ and LHC because I've run some tests and it doesn't happen with other projects running, but of course those other projects are running virtualizations. I'm going to see if it's not my whonix gateway VM that cause problems after all since there is this driver problem at boot and which could maybe lead to BSOD(I actually don't know what happen when it crashes since it's a remote host) ID: 43558 · Reply Quote

vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43563 - Posted: 3 Nov 2020, 15:20:51 UTC - in response to Message 43558. So, I've test a bit more. I don't think it's LHC own problem but more a virtualbox in-house problem with memory addressing or something alike. I've re-update the drivers with the last AMD platform available etc. uninstalled / reboot / reinstalled virtualbox as suggested on the virtualbox forum in various cases and now the situation has evolved a little bit. First because I witnessed one of the crashes since I was close in space to this remote host at that moment. It's not BSOD apparently it's just a crash plain and simple. Everything shuts down and the host restarts on its own. Remarks: the drive continues to work a few seconds after the screen goes black which suggests it's something progressive. Secondly, now I have another problem since all VM stopped working after a few seconds and virtualbox succeeds to give me an error message of memory adressing like this https://share.riseup.net/#PxiN93tK4an6L004tijKZg I don't have any certainty of course but I hope that's the same problem as before and that specific problem suggests to be a platform-specific problem. ID: 43563 · Reply Quote

vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43585 - Posted: 8 Nov 2020, 13:05:00 UTC there was apparently 2 problems. The first one was related to my graphic card most probably, I d'ont think it is dying since it can sustain stresstest but maybe there is an imcpatibility of chipset I don't know. Anyway since then I4ve changed it, and it seems to work okey without any crash ore reboot. The only problem is now that I still have those error messages and this comes directly from lhc@home VM So what does it mean? VBoxManage.exe: error: The VM session was aborted VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component SessionMachine, interface ISession The only think I can think off is that I've installed docker but in WSL but I don't see how it would interfere with a VM from virtualbox. So if anyone has any suggestions since it gives an error to all taks in LHC@Home apparently that would be great. ID: 43585 · Reply Quote

vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43670 - Posted: 22 Nov 2020, 17:22:05 UTC okey so fortunately, it was probably because of the GPU, most probably a problem of incompatibility of the motherboard or the CPU. Thanks for your help ID: 43670 · Reply Quote

vigilian Send message Joined: 23 Jul 05 Posts: 53 Credit: 2,707,793 RAC: 0	Message 43689 - Posted: 23 Nov 2020, 15:38:19 UTC okey so I wasn't aware that virtual machine paltform was hyperV API also. It should maybe be added to yeti thread because it's unclear at first. WSL is not associated with running a VM normally at least with all the IT people I know. And so it is not usually considered as an hypervisor. So is it a way to make it work with CMS or is it a known bug from virtualbox? Because it was working before so I don't see why it wouldn't now. ID: 43689 · Reply Quote