1) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47138)
Posted 10 Aug 2022 by entity
Post:
It was an orphaned snapshot file located under the parent ATLAS vdi file. It had it's own UUID assigned to it and was marked as inaccessible. I used the vboxmanage closemedium <UUID> command and it disappeared. Now the only thing left is the parent ATLAS vdi file. Should that parent be removed also?

Update: had to remove the parent as the snapshot came back after the closemedium command was issued. Once the parent was closed using the closemedium command, the Media Registry in the VirtualBox.xml file disappeared also. Hopefully ATLAS is clean now.
2) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47133)
Posted 10 Aug 2022 by entity
Post:
Only 5 were trying to start as that is what I have set in app_config as the max concurrent.

Unfortunately, I can't provide the link as that was run under an account that I can't logon to.. No ATLAS tasks have been attempted since that one so it would show up under the ce6931730 ID as the last ATLAS task returned.

Is Vbox Media Manager a GUI tool? If so, it isn't available to me on this server as there is no GUI interface (no desktop) installed. Is there a CLI tool available that does the same thing? We may be thinking the same thing, that there might be something amiss in the Vbox config. That was the reason for the reboot yesterday.

I think I may have found the problem. Looking at the Virtual Box config xml files, I can see an ATLAS medium entry with a filename pointing to a slot that doesn't exist. I think I may be able to fix this with the vboxmanage CLI
3) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47131)
Posted 10 Aug 2022 by entity
Post:
Only 5 were trying to start as that is what I have set in app_config as the max concurrent.

Unfortunately, I can't provide the link as that was run under an account that I can't logon to.. No ATLAS tasks have been attempted since that one so it would show up under the ce6931730 ID as the last ATLAS task returned.

Is Vbox Media Manager a GUI tool? If so, it isn't available to me on this server as there is no GUI interface (no desktop) installed. Is there a CLI tool available that does the same thing? We may be thinking the same thing, that there might be something amiss in the Vbox config. That was the reason for the reboot yesterday.
4) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47129)
Posted 10 Aug 2022 by entity
Post:
entity wrote:
These are not running under the "entity" ID. I am using an account manager that creates an account that I cannot logon on to. These are running under ID ce6931730.
Is this one of your error tasks?

https://lhcathome.cern.ch/lhcathome/result.php?resultid=362788360

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10802615

On computers like that a race condition may happen if many vbox tasks start concurrently.
This is caused by a double workaround required to solve a vbox issue and (very likely) a vbox bug on top of that issue.
See:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=578&postid=7708

The vbox developers refuse to correct the issue for years:
"... we would therefore possibly need to bump the global config version. We don't want to do that though because that might make downgrading to pre-4.0 impossible."



What to do?

Option 1:
The computer in question is running Linux.
Hence, ATLAS native may be used instead of ATLAS vbox.


Option 2:
If ATLAS vbox is a must, ensure that at least the 1st ATLAS task of a fresh series starts a few seconds before all others.
This task will prepare the disk entry in vbox for all other tasks.
BOINC does not support such a staggered startup sequence out of the box.
Hence, this has to be ensured by a self made script.

To provide answers to posted questions:

1. Yes, that looks like one of the error tasks

2. When the problem first occurred, at least 5 ATLAS tasks were trying to start at the same time. This hasn't been a problem in the past but will try to prevent this in the future, BTW, I rebooted the machine and then tried to start one ATLAS task. Same error. Just in case it makes a difference, there were 30 theory tasks, 8 CMS tasks, and about 20 sixtrack tasks running at the same time. Would CMS or Theory have any bearing on this problem?

3. I have considered native but I'm in a temporary reduced computing state at the moment before moving to a new location. After the move, I may try the native approach. Until then I'm kind of stuck with VBox.

Thanks for the responses
5) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47117)
Posted 9 Aug 2022 by entity
Post:
These are not running under the "entity" ID. I am using an account manager that creates an account that I cannot logon on to. These are running under ID ce6931730.
6) Message boards : ATLAS application : ATLAS vbox v2.02 (Message 47115)
Posted 9 Aug 2022 by entity
Post:
Started getting immediate errors with ATLAS multi-attach tasks. Prior to today they ran fine. A snippet of the output is included.

2022-08-09 12:45:41 (3632470):
Command: VBoxManage -q showhdinfo "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: 0
Output:
UUID: 6f08958e-7bfd-4804-8dd7-c7b4408cb126
Parent UUID: base
State: created
Type: multiattach
Location: /var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 20480 MBytes
Size on disk: 2645 MBytes
Encryption: disabled
Property: AllocationBlockSize=1048576
Child UUIDs: 80093e87-fcef-479f-801f-dd2cc020d954

2022-08-09 12:45:42 (3632470):
Command: VBoxManage -q storageattach "boinc_a50fb3ffb7fe0a5b" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228409
Output:
VBoxManage: error: Cannot attach medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 772 of file VBoxManageStorageController.cpp

2022-08-09 12:45:42 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:43 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:45 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:46 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:47 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:49 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp



©2024 CERN