Message boards : ATLAS application : What does it mean?
Message board moderation

To post messages, you must log in.

AuthorMessage
Bernard

Send message
Joined: 10 Apr 12
Posts: 39
Credit: 193,853
RAC: 0
Message 36635 - Posted: 5 Sep 2018, 9:22:43 UTC

Application
ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Nom
21DMDmiiKFtnlyackoJh5iwnABFKDmABFKDmucqWDmABFKDmSFCL4m
État
Reporté : VM job unmanageable, restarting later.
Reçu
30/08/2018 08:53:36
Date limite d'envoi
06/09/2018 08:53:30
Ressources
8 CPUs
Taille de tâche estimée
43 200 GFLOPs
Temps de calcul
11:48:41
Temps processeur depuis le point de sauvegarde
00:13:41
Temps écoulé
02:22:37
Temps restant estimé
00:51:52
Portion effectuée
73,326%
Taille de la mémoire virtuelle
0 bytes
Espace mémoire alloué
9,96 GB
Répertoire
slots/7
ID du process
9688
Taux de progression
30,960% par heure
Exécutable
vboxwrapper_26196_windows_x86_64.exe
ID: 36635 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 36636 - Posted: 5 Sep 2018, 10:20:15 UTC - in response to Message 36635.  

It means that there was no timely communication possible between vboxwrapper (BOINC) and VBoxService (VirtualBox) for what ever reason (system too busy??)
BOINC will retry after 86400 seconds (1 day). Restarting BOINC-client will retry to run the task immediately.
ID: 36636 · Report as offensive     Reply Quote
Bernard

Send message
Joined: 10 Apr 12
Posts: 39
Credit: 193,853
RAC: 0
Message 36637 - Posted: 5 Sep 2018, 11:31:44 UTC - in response to Message 36636.  

Thanks
ID: 36637 · Report as offensive     Reply Quote
Bernard

Send message
Joined: 10 Apr 12
Posts: 39
Credit: 193,853
RAC: 0
Message 36641 - Posted: 6 Sep 2018, 6:51:47 UTC - in response to Message 36636.  

In fact each time I cancel and restart the connected client the computing restarts but only for few minutes?
ID: 36641 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,193,023
RAC: 103,440
Message 36642 - Posted: 6 Sep 2018, 7:43:20 UTC
Last modified: 6 Sep 2018, 7:44:53 UTC

Bernard,
if your Computer is not running Atlas 24/7, sorry...
The best for you is to take the other LHC-Tasks (Theory for example or sixtrack (not every time avalaible)).
You are asking this questions also in number crunshing-Forum.
So, check your Computer for the work you can do!
ID: 36642 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 36643 - Posted: 6 Sep 2018, 7:46:30 UTC - in response to Message 36641.  

Could you try a single ATLAS-task with setting the Max # jobs to 1 and Max # CPUs to 4
in your preferences and request new work after you aborted all tasks.
ID: 36643 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,485,377
RAC: 104,406
Message 36647 - Posted: 6 Sep 2018, 10:51:46 UTC - in response to Message 36642.  

Bernard,
if your Computer is not running Atlas 24/7, sorry...
in that case, you should not run any VM task (ATLAS, LHCb, Theory, and CMS [should it ever work again]).
Because each time you shut down your PC, you need to close down properly any running VM task before; i.e. close the BOINC client, then wait a few minutes to give the VM a chance to close. And only then, shut down your PC.
Otherwise, a not properly closed VM will not continue the way it's supposed to at exactly the point where it ended before. In fact, the ATLAS tasks, for example, will start from the beginning (regardless of how long they had been running before).
ID: 36647 · Report as offensive     Reply Quote
Gary

Send message
Joined: 17 Apr 19
Posts: 2
Credit: 76,142
RAC: 0
Message 39064 - Posted: 6 Jun 2019, 7:08:33 UTC - in response to Message 36647.  
Last modified: 6 Jun 2019, 7:27:18 UTC

I don't think this is true, BOINC seems to checkpoint tasks (not sure if virtual box tasks are using VM snapshot's or using checkpoints stored within the image's filesystem) periodically, so even if the shutdown is abrupt, it will resume from the previous checkpoint.

There seems to be a specific problem with ATLAS's vboxwrapper executable where it loses control of the VM periodically: 5/7 such tasks eventually missed their deadline due to this on my machine (1 failed validation, 1 passed). Since this was just wasting slots/time for me, I disabled ATLAS tasks. Now I have a Theory Simulation VM task and it has not yet aborted due to the VM unmanageable issue and seems to be progressing and checkpointing (I hibernate the machine when not in use and the progress is not reseting). Hopefully it will finish.

There have not been any schedule/policy changes other than blocking ATLAS tasks: now this new VM task seems to actually work and progress normally. This is with VirtualBox VM 6.0.8 (was using a version 5 before with the same issue).

From the task properties and From looking at the files in C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome :
Theory is using vboxwrapper_26198ab7_windows_x86_64.exe
ATLAS appears have been using vboxwrapper_26196_windows_x86_64.exe

I assume that the major difference in the tasks is the .vdi used for each (I can see those in the same folder), so possibly this issue can be fixed (very easily?) by the developer updating the vboxwrapper for ATLAS.
ID: 39064 · Report as offensive     Reply Quote

Message boards : ATLAS application : What does it mean?


©2024 CERN