41) Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED (Message 41367)
Posted 27 Jan 2020 by wujj123456
Post:

A bit of research on the stderr error message may be significant.
"container creation failed: mount ->/var error: can't remount /var: operation not permitted"
https://lhcathome.cern.ch/lhcathome/result.php?resultid=256777262
It seems to have something to do with how the local storage is mounted.
https://github.com/sylabs/singularity/issues/2282

I am running into the same. Is this /var on host filesystem? I probably don't want singularity to remount my /var on host system, but if it's trying to mount due to some missing flags, I can probably check what they do and add them so that remount becomes a noop and succeeds.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=260003929

If I couldn't resolve this, is there a way to disable native atlas while allowing native theory without refusing atlas work entirely?
42) Message boards : Theory Application : Unable to start VM on some WUs (Message 41258)
Posted 14 Jan 2020 by wujj123456
Post:
I configured automatic upgrade, but not automatically reboot. I thought the new kernel and components would only be in effect after I reboot. Let me turn auto update off to ensure the vboxdrv is always in sync with kernel to see if the result improves.
43) Message boards : Theory Application : Unable to start VM on some WUs (Message 41245)
Posted 14 Jan 2020 by wujj123456
Post:
I checked a few failed tasks and they all failed with messages like this.

2020-01-12 15:31:37 (2138):
Command: VBoxManage -q startvm "boinc_74513d880c5d6ae6" --type headless
Exit Code: 1
Output:
WARNING: The character device /dev/vboxdrv does not exist.
Please install the virtualbox-dkms package and the appropriate
headers, most likely linux-headers-generic.

You will not be able to start VMs until this problem is fixed.
VBoxManage: error: The virtual machine 'boinc_74513d880c5d6ae6' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
Waiting for VM "boinc_74513d880c5d6ae6" to power on...\


Example failures:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=259134333
https://lhcathome.cern.ch/lhcathome/result.php?resultid=259134354

However, the /dev/vboxdrv exists and the mentioned packages are also installed on the host. The host has valid results for same application as well: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10595991&offset=0&show_names=0&state=0&appid=13

I couldn't find any smoking gun as of why it fails some WUs but not others. Could /dev/vboxdrv temporarily become inaccessible for some reason I should be aware of?


Previous 20


©2024 CERN