Message boards :
ATLAS application :
Handle Vbox resumes
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
Hi team, I am running Atlas in windows with real Vbox. I can see the monoting screen (ctr-alt-f2) and it progresses as it should. However, I have no clue how to handle the situation, when I need to restart my system. I need to do it "often" but I dont want to loose already computed events. Which is the safest way? I have tried "suspend" from boinc manager but when it comes backs, I see events restarted (or at least, "already finished" starts from 0). Thc in advance¡ Javi F |
Send message Joined: 2 May 07 Posts: 1751 Credit: 136,500,598 RAC: 33,508 ![]() ![]() ![]() |
|
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
|
![]() Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 ![]() ![]() |
It is not possible to pause ATLAS tasks. One task runs 200 events. If it is interrupted and resumed it will restart at zero. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
![]() Send message Joined: 28 Sep 04 Posts: 620 Credit: 38,048,830 RAC: 13,243 ![]() ![]() ![]() |
My experience is that you can resume the Atlas tasks after computer reboot, at least on Windows. First I suspend all tasks that are not yet running to avoid them being started when I start suspending running tasks. Then I suspend the running tasks. I have selected in preferences to 'Leave non-GPU tasks in memory while suspended' so suspending them does not save them to disk. You can view the task status in vbox manager, after suspending running tasks they will show 'Paused'. Now you can exit and stop Boinc. When all running tasks are in 'Saved' state you should be safe. On my faster computer the 'ProgramData/Boinc' is on a SSD drive so it is fast enough to save the tasks before Boinc actually shuts down (this may take up to a minute to save them all). After that when I am ready to reboot the computer I will kill manually 'Virtualbox Interface' process from Task Manager as it seems to hang in running and preventing restart even I have closed the vbox manager. It may close itself if you wait long enough. Then I just restart the computer. After restarting the computer my Boinc will autostart but all my tasks are suspended. Now I start to resume them one by one starting from the one that was most progressed. I monitor the disk activity and VBox manager to see when task is safely running again before resuming an other task. When all tasks that were running before restart are safely running, I will resume the rest of the tasks and everything is back to normal. I followed the above just this week when Win 10 got an update and all tasks (Atlas, Theory and even CMS) seem to have survived. But if for some reason they would restart from the beginning, remember that you will get credit according to task runtime and you will get increased credit if the task is finished successfully. Here is an example of a CMS task that was restarted after about 9000 seconds runtime https://lhcathome.cern.ch/lhcathome/result.php?resultid=292888006 and here is one Atlas task https://lhcathome.cern.ch/lhcathome/result.php?resultid=294511025 ![]() |
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
Thx for this usefull and constructive comment. Definetely, I will once again by doing as you suggested and check results. My Pc has slow HD ..anyway, lets see. Is not a matter of credits by waste of time and impotence: I had to reebot my machine once a day for 3 days in a row so I could not reach 200 events in any day...and job started from scratch day after :( Meanwhile, I am crunching Theory, which seems to be shorter and , if resume is not working, not that many hours are wasted. Have a nice day¡ |
![]() Send message Joined: 28 Sep 04 Posts: 620 Credit: 38,048,830 RAC: 13,243 ![]() ![]() ![]() |
You can make your Atlas tasks to run faster if you give it more CPU cores. It is a multicore application. First you can try with 2 CPUs and then see how it goes. Note that it needs more RAM to run but it is less than running two single core tasks simultaneously. ![]() |
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
I have only 2 cpus assigned. NOrammly not enought for Atlas to end in a reasonable time :) Not sure if I can assign + 1 CPU once is started. Will try |
![]() Send message Joined: 28 Sep 04 Posts: 620 Credit: 38,048,830 RAC: 13,243 ![]() ![]() ![]() |
The formula to count required memory is 3000 MB + n * 900 MB, where n is number of CPU cores. So single core task requires 3900 MB, 2 core task requires 4800 MB, 3 cores requires 5700 etc. I don't think you can change number of cores after task has started. ![]() |
Send message Joined: 14 Jan 10 Posts: 1168 Credit: 7,217,203 RAC: 2,072 ![]() ![]() ![]() |
Definetely, I will once again by doing as you suggested and check results. My Pc has slow HD ..anyway, lets see. Harri already give a good explanation how you could save your work when you have to reboot. Since you have a slow disk and maybe more VM's running, before BOINC stopping you could suspend the VM's one by one with 'Leave in memory' off. Meanwhile you could watch in VirtualBox Manager the saving to disk of each job. Of course suspend before all tasks not yet started. Suspending ATLAS overnight is not a good idea. I think the max interruption for ATLAS and CMS is about 1 hour. They need a network connection almost all the time. For Theory it is not a problem to suspend the task for longer periods. |
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
Hi, I did as suggested. I simply "suspend" the task via boincmanager, go to Vbox and see how the image running changes its status to "saved" and let it be for couple of mins untill I do same with next task. Monitoring disk I/O to be sure is done. When I turn on my machine I resume one by one, following same approach. Always giving some extra monute, just in case. It worked with Atlas perfectly :) Thx¡ Javi |
©2023 CERN