Message boards :
LHCb Application :
LHCb/other tasks failing after putting computer into hibernation state?
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Aug 17 Posts: 15 Credit: 179,253 RAC: 0 |
after a 22 day uptime and relatively no problem running LHCb tasks or others, with just a few errors here and there, today i found that upon waking from hibernation and starting 3 LHCb and 1 CMS task, they all promptly failed. i had one LHCb task in progress from the previous night, over 50% complete but that failed as well when i resumed it, along with the 3 others i had just begun. the "Exit status" error varied for each task, but the log in all of them contained "2017-10-09 10:36:00 (3196): Guest Log: 10/09/17 10:26:04 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state" i dont know much of what the log stuff means or if that is at all relevant. the only settings i changed from last night(besides putting my machine into hibernation for the night) until this morning was CPU time, which i raised from 50% to 60%. thanks for any help. edit: it seems all of the tasks i attempt are failing almost immediately. i should probably stop trying to run these for now yeah? |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,673 RAC: 30,777 |
... edit: it seems all of the tasks i attempt are failing almost immediately. i should probably stop trying to run these for now yeah? well, right now no other tasks than ATLAS seem to be available anyway, at least from what can be seen from the Project Status Page: https://lhcathome.cern.ch/lhcathome/server_status.php |
Send message Joined: 17 Aug 17 Posts: 15 Credit: 179,253 RAC: 0 |
... edit: it seems all of the tasks i attempt are failing almost immediately. i should probably stop trying to run these for now yeah? oh weird, when i checked it was all up and running. would this explain my issue? |
Send message Joined: 14 May 17 Posts: 3 Credit: 1,004,936 RAC: 0 |
Sorry for resurection; but I had the same issue https://lhcathome.cern.ch/lhcathome/result.php?resultid=204084499 I paused a number of WU to finish some other tasks from different project and when returned the WU failed dumping work already done. Is there a "correct" way to hold VM-based WUs ? If not then I can next time direct dump the WU if there is a change in processing sequence required on my client. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,831,893 RAC: 16,131 |
Sorry for resurection; but I had the same issue You just got the typical error for the VB tasks that just happen once in a while (depending on how many you are running) Guest Log: [ERROR] Condor exited after 111913s without running a job. The main thing is you can check the VB Manager and make sure they are saved and suspended and then switch to other tasks (same if you have to reboot for any reason) That one you lost was less than an hour running so no big deal and just get new tasks and start over again if you want to run the LHCb and soon I hope to have the multi-core version moved over here too. |
Send message Joined: 14 May 17 Posts: 3 Credit: 1,004,936 RAC: 0 |
Thanks for the quick response; any suggestion on the sequence of steps ? 1) first suspend the VM, then suspend the WU in BOINC Manager 2) first suspend the WU, then suspend the VM 3) doesn’t matter, just suspend both WU and respective VM in short timeframe TIA |
Send message Joined: 17 Aug 17 Posts: 15 Credit: 179,253 RAC: 0 |
perhaps a dumb questions, but how does one check if tasks are "saved"? how do you suspend the VM seperately from the WU? ive been away from LHC@home or any other distributed computing projects for several months now; i forgot a lot of things about this stuff. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,831,893 RAC: 16,131 |
To check to make sure the VB tasks are saved just bring up your VB Manager and you will see if they are saved or running or paused. And Thanks for the quick response; any suggestion on the sequence of steps ? Just suspend the WU and soon after you will see that it was also suspended/saved in the VB Manager |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,874,046 RAC: 38,809 |
There's a communication chain between different software layers: BOINC client <--> vboxwrapper <--> VirtualBox Hypervisor <--> LHC VM The only right way to suspend/resume a VM is via BOINC client. You may use your VirtualBox Manager for status checks but never use it to suspend a VM. |
©2024 CERN