Message boards :
Theory Application :
New version v262.80 for Windows 64bit
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
This new version is for 64bit Windows only. It provides a rebuilt vboxwrapper which uses the VBoxManage command to control the VMs rather than the API. Please let us know if you see any issues with this build. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 831 ![]() ![]() |
My Avast Antivirus(free) didn't like the new wrapper so I lost 2 tasks to an access denied error at startup. I already had everything-Boinc excluded but with a recent AV update, 1 host kept those exclusions but the other cleared them so I have had to put them back in again. The errors left ghosts in VBox so I had to abort a 3rd task clear them all out. 3 tasks running on 2 hosts, now, all running jobs fine. None of them old enough to RETURN a job yet but looking good so far after the initial (solved) start-up problem. Later All 3 running 262.80 VMs have completed and returned jobs and got new ones. Standard Boinc over-estimate of runtime which will settle down as it "learns" about the new wrapper. I'm assuming we are sticking to the 12 hour + complete-job-in-progress cut-off. |
Send message Joined: 27 Sep 08 Posts: 880 Credit: 746,962,684 RAC: 326,151 ![]() ![]() ![]() |
I had a ton of tasks that said VM job unmanageable, restarting later. Seems like these got stuck and blocked more work being fetched so computers were idle. I find this to be regression in reliablilty. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 ![]() ![]() |
I tried to run a 2-core wu for theory and it succeeded. one process sherpa - one process pythia. I noticed that the time left within the vm Event 20500 ( 15h 23m 21s elapsed / 2h 37m 38s left )for ending the sherpa job was higher than the time remaining for the wu in boinc client. So not all the 24000 events set by default haven't been processed (only 20500). When a work unit in boinc client ends while the job inside is not finished ,is the result of the partial job done saved for the project? Will the 3500 events not done be sent to another volunteer ? Why the credits earned didn't take in account the fact 2 cores have been used instead of one? Are only the sherpa jobs possible when several cores are used ? I never saw them with a one core... |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
I tried to run a 2-core wu for theory and it succeeded. No, the processed events are lost, when the VM is shutdown due to the 18hr limit. Why the credits earned didn't take in account the fact 2 cores have been used instead of one? When you're here for the credits, run only single core tasks. When you have enough memory, it's even more efficient cause less idle cpu-time. Only ATLAS will run faster when using more cores. Are only the sherpa jobs possible when several cores are used ? No, sherpa's will also appear on single core VM's. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 ![]() ![]() |
Thanks for the answers. |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,794,835 RAC: 76,093 ![]() ![]() |
I had a ton of tasks that said VM job unmanageable, restarting later. Yeah Toby I have had a few of these and even tried to force one to restart but I could tell it wasn't going to restart so I aborted that one and the others that did that even before starting. Volunteer Mad Scientist For Life ![]() |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,794,835 RAC: 76,093 ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10337530 It does it after started and before started. One thing it claims is not enough memory and I watched as it happened and it had plenty of memory left and before could run all 8 cores with no problems. |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
This version suffers much more from [LHC@home] task postponed 86400.000000 sec: Communication with VM Hypervisor failed. probably due to VBoxSVC too busy to respond immediately on a VBoxManage request. You have to wait a whole day (or restart BOINC). |
Send message Joined: 21 Jan 17 Posts: 3 Credit: 2,797,466 RAC: 0 ![]() ![]() |
I presently have nine Theory Simulation 262.80 (vbox64) tasks that have postponed themselves with the same message given by others in this thread: Postponed: VM job unmanageable, restarting later. The run time prior to postponement varies wildly. The shortest was 14 seconds; the longest was 5 hours and 36 minutes. All of them have an estimated completion time of at least 10 hours. |
Send message Joined: 27 Sep 08 Posts: 880 Credit: 746,962,684 RAC: 326,151 ![]() ![]() ![]() |
Magic how did you force the restart? Quit Boinc and restart? |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 468 Credit: 214,945,248 RAC: 46,614 ![]() ![]() |
I presently have nine Theory Simulation 262.80 (vbox64) tasks that have postponed themselves with the same message given by others in this thread: You can find some information about postponed VMs in my Checklist V3 for Atlas at No. 6 and No. 16d The checklist was designed for Atlas, but regarding postponed it will help fo VMs ![]() Supporting BOINC, a great concept ! |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,794,835 RAC: 76,093 ![]() ![]() |
Magic how did you force the restart? Quit Boinc and restart? Well I do a few things like going to the VB manager and telling it to *start* or I just pause everything and do a reboot. The one I did during my previous post was the reboot version and that worked and it restarted and finished a few hours later. Still not sure why it is saying it is a memory problem now. It has no problem running four of the X2 LHC-dev Theory tasks. Still has 2.7GB memory available and the times I watched the tasks here crash it said I still had 5GB available. |
Send message Joined: 27 Sep 08 Posts: 880 Credit: 746,962,684 RAC: 326,151 ![]() ![]() ![]() |
Thanks, same as me. I don't see any ram problems, just lack of communication. I wrote a script that restart BOINC if there is more than 50% of the tasks stuck in pending. I run it every hour to fix the pending's |
Send message Joined: 27 Sep 08 Posts: 880 Credit: 746,962,684 RAC: 326,151 ![]() ![]() ![]() |
I think will just set to no new work til there is a new version. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 831 ![]() ![]() |
Another couple of issues with 262.80: VMs don't always abide by the 18 hour cut-off; I have 3 VMs at 23, 24 and 25 hours. 2 were reset a short time ago as they both had loopers (that I forgot to take details of, doh!) and have started new jobs. I'll see if they terminate after those. (nope) If not, I'll try to end them "gracefully". The 25hr one finished its last job 9 hrs ago but has given up after a few condor write failures. A reset got it a new job. 3 successful graceful task terminations. [Shutdown signal doesn't always properly get to VBox although that might my fault as I'm running Boinc 7.7.2 which hasn't been fully cleared for release yet] Often the progress % bears no relation elapsed time eg. 7hrs 39% 24hrs 6% 25hrs 65% 90 mins 8.3% 110 mins 7.4% [3.01 multis running single core appear immune to these issues although it looks like they use the same wrapper] Let's see what 262.90 has to offer. Got ...80s just like others have reported in the ...90 thread. |
©2025 CERN