Message boards :
ATLAS application :
ATLAS vbox v2.02
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi all, We have just released version 2.02 of ATLAS vbox. This comes with the same new multiattach feature that was in v2.01 but contains an updated vboxwrapper (a pre-release of v26205) which should fix some of the problems seen with v2.01. This version is available for Windows and Linux, a Mac version will come when there is an official release of the new vboxwrapper. Please let us know of any issues. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
1st task is up (on Linux) and processing events. So far everything looks fine. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
I woke up my PC to add some warmth to the tropical heat. After updating Windows OS, VirtualBox to 6.1.36 and BOINC to 7.20.2, I'm ready to test the advantage of multi-attached virtual disks. I started 5 tasks (4-core VMs) one after another with ~1 minute interval. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10690380 |
Send message Joined: 17 Sep 04 Posts: 105 Credit: 32,855,188 RAC: 2,341 |
Error while computing on 3 work units, Windows 11. https://lhcathome.cern.ch/lhcathome/result.php?resultid=361920060 Regards, Bob P. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
I started 5 tasks (4-core VMs) one after another with ~1 minute interval.I returned those 5 without issues. Cooling down now. Peak disk usage 1.2 GB - 1.3 GB. Previous version use to need 3.8 GB/task. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
This is from one of your logs: Waiting for VM "boinc_956a3b554f908959" to power on... VBoxManage.exe: error: The virtual machine 'boinc_956a3b554f908959' has terminated unexpectedly during startup with exit code 1 (0x1). More details may be available in 'C:\ProgramData\BOINC\slots\9\boinc_956a3b554f908959\Logs\VBoxHardening.log' VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component MachineWrap, interface IMachine It's mostly caused by active AV software, sometimes related to Hyper-V. Be so kind as to check this running VMs with AV/Hyper-V being disabled. You may need to clean the VirtualBox media register before the next try. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
Found a host that has >100 failed ATLAS 2.02 tasks: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10692002&offset=0&show_names=0&state=0&appid=14 The reason is "AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)". This needs user intervention! |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
When seeing this, sending a PM in a friendly way. Stop of the Boinc-VM is now in ONE second, instead of minutes. More tasks are possible so over the day. Thank You David and your Team, great solution! |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
|
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
Found a host that has >100 failed ATLAS 2.02 tasks: The daily quota for that machine for the Theory app is already down to 1 and for ATLAS 2.02 down from 350 yesterday to 148 now. I suppose Windows Hyper-V is not disabled. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=361937924Is there a limit how many childs you can have? You already had five. Or was the machine just too busy with all VBoxmanage commands? It's always a good idea to start/resume VM's with a time interval. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
From here: https://www.virtualbox.org/wiki/Changelog-6.1 "VirtualBox 6.1.0 (released December 10 2019) . . . Runtime: Works now on hosts with many CPUs (limit now 1024)" Regarding the children a multiattach disk can have: I didn't find an official limit. Own tests ran fine with up to 14 per BOINC client and 2 clients per host (different usernames), hence 28 per host. According to stderr.txt more details might have been in VBoxHardening.log but that file has been removed during VM cleanup: "More details may be available in 'D:\ProgramData\BOINC\slots\6\boinc_094c3868416fb020\Logs\VBoxHardening.log'" Stderr.txt shows that the task passed the new code without an error and failed a couple of steps later. It did note even need to go through the new workarounds since the parent disk was already a 'multiattach'. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=361937924 There is now a second with the same fault: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=193088580 For me this is the only multiattach with this error, but have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
... but have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again. If they are still running, what do you see when you show the display from VirtualBox Manager? |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
Within the stderr.txt there's again the hint to look through the hardening log. See: https://forums.virtualbox.org/viewtopic.php?f=25&t=82106 |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
... but have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again. https://lhcathome.cern.ch/lhcathome/result.php?resultid=361967876 Windows-xxxx show with ALT+F1 CentOS Linux 7 (Core) Kernel 3.10.0.-957.27.2.el7.x86_64 on an x86_64 localhost login: ALT+F2 ATLAS Event Progress Monitoring with startinfo N/A ALT+F3 only Linux PID User... with VBoxService.... top.....systemd.... Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
We have Windows-PC's with problem to start Virtualbox normal. Have taken a deeper look into wingmans. Something went wrong with multiattach and older Virtuallbox-Versions? Here is one examle for a lot of other with the same Error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=361978132 Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: The VBoxSDS windows service is disabled. VBoxManage.exe: error: Reinstall VirtualBox to fix it. Alternatively try reenable the service by setting it to 'Manual' startup type in the Windows Service management console, or by runing 'sc config VBoxSDS start=demand' on the command line. VBoxManage.exe: error: Details: code ERROR_SERVICE_DISABLED 0x80070422 (0x80070422), component VirtualBoxClientWrap, interface IVirtualBoxClient |
Send message Joined: 7 May 08 Posts: 222 Credit: 1,575,053 RAC: 6 |
All Atlas error (ex. 361999109) This is the message: 2022-08-05 07:06:06 (6740): Starting VM using VBoxManage interface. (boinc_644bdea39c427f8e, slot#0) That's not true!! I'm running correctly other projects with virtual machine (for example LHC-Dev, Rosetta, etc) |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,535,270 RAC: 4,696 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=361967876The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks, there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,419,496 RAC: 65,143 |
On the same computer using the same VirtualBox instance and the same user account? On Windows it's most likely Hyper-V or an AV software that crashes VirtualBox and/or makes the BOINC client think VT-x/AMD-V is disabled. |
©2025 CERN