Message boards : ATLAS application : ATLAS vbox v2.02
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 378
Credit: 14,817,984
RAC: 9,598
Message 47061 - Posted: 3 Aug 2022, 10:22:54 UTC

Hi all,

We have just released version 2.02 of ATLAS vbox. This comes with the same new multiattach feature that was in v2.01 but contains an updated vboxwrapper (a pre-release of v26205) which should fix some of the problems seen with v2.01.

This version is available for Windows and Linux, a Mac version will come when there is an official release of the new vboxwrapper.

Please let us know of any issues.
ID: 47061 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47062 - Posted: 3 Aug 2022, 10:47:36 UTC - in response to Message 47061.  

1st task is up (on Linux) and processing events.
So far everything looks fine.
ID: 47062 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47063 - Posted: 3 Aug 2022, 12:00:46 UTC

I woke up my PC to add some warmth to the tropical heat.
After updating Windows OS, VirtualBox to 6.1.36 and BOINC to 7.20.2, I'm ready to test the advantage of multi-attached virtual disks.
I started 5 tasks (4-core VMs) one after another with ~1 minute interval.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10690380
ID: 47063 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 87
Credit: 27,492,815
RAC: 13,291
Message 47064 - Posted: 3 Aug 2022, 14:25:54 UTC

Error while computing on 3 work units, Windows 11.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=361920060
Regards,
Bob P.
ID: 47064 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47065 - Posted: 3 Aug 2022, 14:49:55 UTC - in response to Message 47063.  

I started 5 tasks (4-core VMs) one after another with ~1 minute interval.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10690380
I returned those 5 without issues. Cooling down now.
Peak disk usage 1.2 GB - 1.3 GB. Previous version use to need 3.8 GB/task.
ID: 47065 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47066 - Posted: 3 Aug 2022, 14:55:58 UTC - in response to Message 47064.  

This is from one of your logs:
Waiting for VM "boinc_956a3b554f908959" to power on...
VBoxManage.exe: error: The virtual machine 'boinc_956a3b554f908959' has terminated unexpectedly during startup with exit code 1 (0x1).  More details may be available in 'C:\ProgramData\BOINC\slots\9\boinc_956a3b554f908959\Logs\VBoxHardening.log'
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component MachineWrap, interface IMachine


It's mostly caused by active AV software, sometimes related to Hyper-V.
Be so kind as to check this running VMs with AV/Hyper-V being disabled.

You may need to clean the VirtualBox media register before the next try.
ID: 47066 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47067 - Posted: 3 Aug 2022, 17:58:34 UTC

Found a host that has >100 failed ATLAS 2.02 tasks:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10692002&offset=0&show_names=0&state=0&appid=14

The reason is "AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)".
This needs user intervention!
ID: 47067 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1692
Credit: 113,243,381
RAC: 309,756
Message 47068 - Posted: 3 Aug 2022, 18:06:15 UTC - in response to Message 47067.  
Last modified: 3 Aug 2022, 18:22:01 UTC

When seeing this, sending a PM in a friendly way.

Stop of the Boinc-VM is now in ONE second, instead of minutes.
More tasks are possible so over the day.
Thank You David and your Team, great solution!
ID: 47068 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1692
Credit: 113,243,381
RAC: 309,756
Message 47070 - Posted: 4 Aug 2022, 4:14:41 UTC - in response to Message 47068.  

ID: 47070 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47072 - Posted: 4 Aug 2022, 7:22:26 UTC - in response to Message 47067.  

Found a host that has >100 failed ATLAS 2.02 tasks:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10692002&offset=0&show_names=0&state=0&appid=14

The reason is "AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)".
This needs user intervention!

The daily quota for that machine for the Theory app is already down to 1 and for ATLAS 2.02 down from 350 yesterday to 148 now.
I suppose Windows Hyper-V is not disabled.
ID: 47072 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47073 - Posted: 4 Aug 2022, 7:31:02 UTC - in response to Message 47070.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=361937924
Is there a limit how many childs you can have? You already had five.
Or was the machine just too busy with all VBoxmanage commands?
It's always a good idea to start/resume VM's with a time interval.
ID: 47073 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47074 - Posted: 4 Aug 2022, 8:03:25 UTC - in response to Message 47073.  

From here:
https://www.virtualbox.org/wiki/Changelog-6.1

"VirtualBox 6.1.0 (released December 10 2019)
.
.
.
Runtime: Works now on hosts with many CPUs (limit now 1024)"


Regarding the children a multiattach disk can have:
I didn't find an official limit.
Own tests ran fine with up to 14 per BOINC client and 2 clients per host (different usernames), hence 28 per host.



According to stderr.txt more details might have been in VBoxHardening.log but that file has been removed during VM cleanup:
"More details may be available in 'D:\ProgramData\BOINC\slots\6\boinc_094c3868416fb020\Logs\VBoxHardening.log'"


Stderr.txt shows that the task passed the new code without an error and failed a couple of steps later.
It did note even need to go through the new workarounds since the parent disk was already a 'multiattach'.
ID: 47074 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1692
Credit: 113,243,381
RAC: 309,756
Message 47075 - Posted: 4 Aug 2022, 12:22:18 UTC - in response to Message 47070.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=361937924

There is now a second with the same fault:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=193088580
For me this is the only multiattach with this error, but
have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again.
ID: 47075 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47076 - Posted: 4 Aug 2022, 13:02:04 UTC - in response to Message 47075.  

... but have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again.

If they are still running, what do you see when you show the display from VirtualBox Manager?
ID: 47076 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47077 - Posted: 4 Aug 2022, 14:01:41 UTC - in response to Message 47075.  

Within the stderr.txt there's again the hint to look through the hardening log.
See:
https://forums.virtualbox.org/viewtopic.php?f=25&t=82106
ID: 47077 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1692
Credit: 113,243,381
RAC: 309,756
Message 47078 - Posted: 4 Aug 2022, 20:28:56 UTC - in response to Message 47076.  
Last modified: 4 Aug 2022, 21:23:50 UTC

... but have a handful with low CPU < 1 min. and runtime 1, 2 or 5 hours again.

If they are still running, what do you see when you show the display from VirtualBox Manager?


https://lhcathome.cern.ch/lhcathome/result.php?resultid=361967876
Windows-xxxx show with ALT+F1
CentOS Linux 7 (Core)
Kernel 3.10.0.-957.27.2.el7.x86_64 on an x86_64
localhost login:
ALT+F2
ATLAS Event Progress Monitoring with startinfo N/A

ALT+F3 only Linux PID User... with VBoxService.... top.....systemd....

Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste.
ID: 47078 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1692
Credit: 113,243,381
RAC: 309,756
Message 47079 - Posted: 4 Aug 2022, 22:40:45 UTC
Last modified: 4 Aug 2022, 22:43:34 UTC

We have Windows-PC's with problem to start Virtualbox normal.
Have taken a deeper look into wingmans.
Something went wrong with multiattach and older Virtuallbox-Versions?
Here is one examle for a lot of other with the same Error:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=361978132
Output:
VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: The VBoxSDS windows service is disabled.
VBoxManage.exe: error: Reinstall VirtualBox to fix it. Alternatively try reenable the service by setting it to 'Manual' startup type in the Windows Service management console, or by runing 'sc config VBoxSDS start=demand' on the command line.
VBoxManage.exe: error: Details: code ERROR_SERVICE_DISABLED 0x80070422 (0x80070422), component VirtualBoxClientWrap, interface IVirtualBoxClient
ID: 47079 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 150
Credit: 1,436,568
RAC: 185
Message 47080 - Posted: 5 Aug 2022, 6:52:22 UTC

All Atlas error (ex. 361999109)

This is the message:
2022-08-05 07:06:06 (6740): Starting VM using VBoxManage interface. (boinc_644bdea39c427f8e, slot#0)
2022-08-05 07:06:09 (6740): Error in start VM for VM: -2147467259
Command:
VBoxManage -q startvm "boinc_644bdea39c427f8e" --type headless
Output:
Waiting for VM "boinc_644bdea39c427f8e" to power on...
VBoxManage.exe: error: Not in a hypervisor partition (HVP=0) (VERR_NEM_NOT_AVAILABLE).
VBoxManage.exe: error: AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component ConsoleWrap, interface IConsole

2022-08-05 07:06:09 (6740): VM failed to start.
2022-08-05 07:06:09 (6740): Could not start
2022-08-05 07:06:09 (6740): ERROR: VM failed to start


That's not true!! I'm running correctly other projects with virtual machine (for example LHC-Dev, Rosetta, etc)
ID: 47080 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1149
Credit: 7,035,906
RAC: 669
Message 47081 - Posted: 5 Aug 2022, 7:06:09 UTC - in response to Message 47078.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=361967876
Windows-xxxx show with ALT+F1
CentOS Linux 7 (Core)
Kernel 3.10.0.-957.27.2.el7.x86_64 on an x86_64
localhost login:
ALT+F2
ATLAS Event Progress Monitoring with startinfo N/A

ALT+F3 only Linux PID User... with VBoxService.... top.....systemd....
The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks,
there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start.
ID: 47081 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2121
Credit: 169,341,741
RAC: 113,956
Message 47082 - Posted: 5 Aug 2022, 7:07:42 UTC - in response to Message 47080.  

On the same computer using the same VirtualBox instance and the same user account?

On Windows it's most likely Hyper-V or an AV software that crashes VirtualBox and/or makes the BOINC client think VT-x/AMD-V is disabled.
ID: 47082 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : ATLAS application : ATLAS vbox v2.02


©2023 CERN