Thread 'What does it mean?'

Author	Message
Bernard Send message Joined: 10 Apr 12 Posts: 39 Credit: 193,853 RAC: 0	Message 36635 - Posted: 5 Sep 2018, 9:22:43 UTC Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas) Nom 21DMDmiiKFtnlyackoJh5iwnABFKDmABFKDmucqWDmABFKDmSFCL4m Ã‰tat ReportÃ© : VM job unmanageable, restarting later. ReÃ§u 30/08/2018 08:53:36 Date limite d'envoi 06/09/2018 08:53:30 Ressources 8 CPUs Taille de tÃ¢che estimÃ©e 43 200 GFLOPs Temps de calcul 11:48:41 Temps processeur depuis le point de sauvegarde 00:13:41 Temps Ã©coulÃ© 02:22:37 Temps restant estimÃ© 00:51:52 Portion effectuÃ©e 73,326% Taille de la mÃ©moire virtuelle 0 bytes Espace mÃ©moire allouÃ© 9,96 GB RÃ©pertoire slots/7 ID du process 9688 Taux de progression 30,960% par heure ExÃ©cutable vboxwrapper_26196_windows_x86_64.exe ID: 36635 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1561 Credit: 10,123,004 RAC: 1,294	Message 36636 - Posted: 5 Sep 2018, 10:20:15 UTC - in response to Message 36635. It means that there was no timely communication possible between vboxwrapper (BOINC) and VBoxService (VirtualBox) for what ever reason (system too busy??) BOINC will retry after 86400 seconds (1 day). Restarting BOINC-client will retry to run the task immediately. ID: 36636 · Reply Quote

Bernard Send message Joined: 10 Apr 12 Posts: 39 Credit: 193,853 RAC: 0	Message 36637 - Posted: 5 Sep 2018, 11:31:44 UTC - in response to Message 36636. Thanks ID: 36637 · Reply Quote

Bernard Send message Joined: 10 Apr 12 Posts: 39 Credit: 193,853 RAC: 0	Message 36641 - Posted: 6 Sep 2018, 6:51:47 UTC - in response to Message 36636. In fact each time I cancel and restart the connected client the computing restarts but only for few minutes? ID: 36641 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2305 Credit: 179,727,092 RAC: 2,365	Message 36642 - Posted: 6 Sep 2018, 7:43:20 UTC Last modified: 6 Sep 2018, 7:44:53 UTC Bernard, if your Computer is not running Atlas 24/7, sorry... The best for you is to take the other LHC-Tasks (Theory for example or sixtrack (not every time avalaible)). You are asking this questions also in number crunshing-Forum. So, check your Computer for the work you can do! ID: 36642 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1561 Credit: 10,123,004 RAC: 1,294	Message 36643 - Posted: 6 Sep 2018, 7:46:30 UTC - in response to Message 36641. Could you try a single ATLAS-task with setting the Max # jobs to 1 and Max # CPUs to 4 in your preferences and request new work after you aborted all tasks. ID: 36643 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1994 Credit: 164,363,115 RAC: 110,707	Message 36647 - Posted: 6 Sep 2018, 10:51:46 UTC - in response to Message 36642. Bernard, if your Computer is not running Atlas 24/7, sorry... in that case, you should not run any VM task (ATLAS, LHCb, Theory, and CMS [should it ever work again]). Because each time you shut down your PC, you need to close down properly any running VM task before; i.e. close the BOINC client, then wait a few minutes to give the VM a chance to close. And only then, shut down your PC. Otherwise, a not properly closed VM will not continue the way it's supposed to at exactly the point where it ended before. In fact, the ATLAS tasks, for example, will start from the beginning (regardless of how long they had been running before). ID: 36647 · Reply Quote

Gary Send message Joined: 17 Apr 19 Posts: 2 Credit: 76,142 RAC: 0	Message 39064 - Posted: 6 Jun 2019, 7:08:33 UTC - in response to Message 36647. Last modified: 6 Jun 2019, 7:27:18 UTC I don't think this is true, BOINC seems to checkpoint tasks (not sure if virtual box tasks are using VM snapshot's or using checkpoints stored within the image's filesystem) periodically, so even if the shutdown is abrupt, it will resume from the previous checkpoint. There seems to be a specific problem with ATLAS's vboxwrapper executable where it loses control of the VM periodically: 5/7 such tasks eventually missed their deadline due to this on my machine (1 failed validation, 1 passed). Since this was just wasting slots/time for me, I disabled ATLAS tasks. Now I have a Theory Simulation VM task and it has not yet aborted due to the VM unmanageable issue and seems to be progressing and checkpointing (I hibernate the machine when not in use and the progress is not reseting). Hopefully it will finish. There have not been any schedule/policy changes other than blocking ATLAS tasks: now this new VM task seems to actually work and progress normally. This is with VirtualBox VM 6.0.8 (was using a version 5 before with the same issue). From the task properties and From looking at the files in C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome : Theory is using vboxwrapper_26198ab7_windows_x86_64.exe ATLAS appears have been using vboxwrapper_26196_windows_x86_64.exe I assume that the major difference in the tasks is the .vdi used for each (I can see those in the same folder), so possibly this issue can be fixed (very easily?) by the developer updating the vboxwrapper for ATLAS. ID: 39064 · Reply Quote