1) Message boards : ATLAS application : Atlas tasks "Postponed: VM job unmanageable, restarting later." (Message 39065)
Posted 6 Jun 2019 by Gary
Post:
Posting this again to this thread:

There seems to be a specific problem with ATLAS's vboxwrapper executable where it loses control of the VM periodically: 5/7 such tasks eventually missed their deadline due to this on my machine (1 failed validation, 1 passed). Since this was just wasting slots/time for me, I disabled ATLAS tasks. Now I have a Theory Simulation VM task and it has not yet aborted due to the VM unmanageable issue and seems to be progressing and checkpointing (I hibernate the machine when not in use and the progress is not reseting). Hopefully it will finish.

There have not been any schedule/policy changes other than blocking ATLAS tasks: now this new VM task seems to actually work and progress normally. This is with VirtualBox VM 6.0.8 (was using a version 5 before with the same issue).

From the task properties and From looking at the files in C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome :
Theory is using vboxwrapper_26198ab7_windows_x86_64.exe
ATLAS appears have been using vboxwrapper_26196_windows_x86_64.exe

I assume that the major difference in the tasks is the .vdi used for each (I can see those in the same folder), so possibly this issue can be fixed (very easily?) by the developer updating the vboxwrapper for ATLAS.

I think this has been noted before (maybe not in this thread though): is it too simple to fix the issue?
2) Message boards : ATLAS application : What does it mean? (Message 39064)
Posted 6 Jun 2019 by Gary
Post:
I don't think this is true, BOINC seems to checkpoint tasks (not sure if virtual box tasks are using VM snapshot's or using checkpoints stored within the image's filesystem) periodically, so even if the shutdown is abrupt, it will resume from the previous checkpoint.

There seems to be a specific problem with ATLAS's vboxwrapper executable where it loses control of the VM periodically: 5/7 such tasks eventually missed their deadline due to this on my machine (1 failed validation, 1 passed). Since this was just wasting slots/time for me, I disabled ATLAS tasks. Now I have a Theory Simulation VM task and it has not yet aborted due to the VM unmanageable issue and seems to be progressing and checkpointing (I hibernate the machine when not in use and the progress is not reseting). Hopefully it will finish.

There have not been any schedule/policy changes other than blocking ATLAS tasks: now this new VM task seems to actually work and progress normally. This is with VirtualBox VM 6.0.8 (was using a version 5 before with the same issue).

From the task properties and From looking at the files in C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome :
Theory is using vboxwrapper_26198ab7_windows_x86_64.exe
ATLAS appears have been using vboxwrapper_26196_windows_x86_64.exe

I assume that the major difference in the tasks is the .vdi used for each (I can see those in the same folder), so possibly this issue can be fixed (very easily?) by the developer updating the vboxwrapper for ATLAS.



©2022 CERN