New version v262.80 for Windows 64bit

Author	Message
Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0	Message 29978 - Posted: 20 Apr 2017, 13:21:18 UTC This new version is for 64bit Windows only. It provides a rebuilt vboxwrapper which uses the VBoxManage command to control the VMs rather than the API. Please let us know if you see any issues with this build. ID: 29978 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0	Message 29994 - Posted: 21 Apr 2017, 18:28:17 UTC Last modified: 21 Apr 2017, 21:28:35 UTC My Avast Antivirus(free) didn't like the new wrapper so I lost 2 tasks to an access denied error at startup. I already had everything-Boinc excluded but with a recent AV update, 1 host kept those exclusions but the other cleared them so I have had to put them back in again. The errors left ghosts in VBox so I had to abort a 3rd task clear them all out. 3 tasks running on 2 hosts, now, all running jobs fine. None of them old enough to RETURN a job yet but looking good so far after the initial (solved) start-up problem. Later All 3 running 262.80 VMs have completed and returned jobs and got new ones. Standard Boinc over-estimate of runtime which will settle down as it "learns" about the new wrapper. I'm assuming we are sticking to the 12 hour + complete-job-in-progress cut-off. ID: 29994 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 853 Credit: 695,748,745 RAC: 148,231	Message 29998 - Posted: 22 Apr 2017, 6:48:08 UTC Last modified: 22 Apr 2017, 9:25:51 UTC I had a ton of tasks that said VM job unmanageable, restarting later. Seems like these got stuck and blocked more work being fetched so computers were idle. I find this to be regression in reliablilty. ID: 29998 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 30001 - Posted: 22 Apr 2017, 9:51:18 UTC - in response to Message 29998. I tried to run a 2-core wu for theory and it succeeded. one process sherpa - one process pythia. I noticed that the time left within the vm Event 20500 ( 15h 23m 21s elapsed / 2h 37m 38s left ) for ending the sherpa job was higher than the time remaining for the wu in boinc client. So not all the 24000 events set by default haven't been processed (only 20500). When a work unit in boinc client ends while the job inside is not finished ,is the result of the partial job done saved for the project? Will the 3500 events not done be sent to another volunteer ? Why the credits earned didn't take in account the fact 2 cores have been used instead of one? Are only the sherpa jobs possible when several cores are used ? I never saw them with a one core... ID: 30001 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1432 Credit: 9,594,942 RAC: 6,465	Message 30007 - Posted: 22 Apr 2017, 16:17:34 UTC - in response to Message 30001. I tried to run a 2-core wu for theory and it succeeded. one process sherpa - one process pythia. I noticed that the time left within the vm Event 20500 ( 15h 23m 21s elapsed / 2h 37m 38s left ) for ending the sherpa job was higher than the time remaining for the wu in boinc client. So not all the 24000 events set by default haven't been processed (only 20500). When a work unit in boinc client ends while the job inside is not finished ,is the result of the partial job done saved for the project? Will the 3500 events not done be sent to another volunteer ? No, the processed events are lost, when the VM is shutdown due to the 18hr limit. Why the credits earned didn't take in account the fact 2 cores have been used instead of one? When you're here for the credits, run only single core tasks. When you have enough memory, it's even more efficient cause less idle cpu-time. Only ATLAS will run faster when using more cores. Are only the sherpa jobs possible when several cores are used ? I never saw them with a one core... No, sherpa's will also appear on single core VM's. ID: 30007 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 30008 - Posted: 22 Apr 2017, 16:25:16 UTC - in response to Message 30007. Last modified: 22 Apr 2017, 16:25:34 UTC Thanks for the answers. ID: 30008 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1183 Credit: 56,048,330 RAC: 57,125	Message 30012 - Posted: 22 Apr 2017, 23:29:43 UTC - in response to Message 29998. I had a ton of tasks that said VM job unmanageable, restarting later. Seems like these got stuck and blocked more work being fetched so computers were idle. I find this to be regression in reliability. Yeah Toby I have had a few of these and even tried to force one to restart but I could tell it wasn't going to restart so I aborted that one and the others that did that even before starting. Volunteer Mad Scientist For Life ID: 30012 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1183 Credit: 56,048,330 RAC: 57,125	Message 30013 - Posted: 23 Apr 2017, 1:10:37 UTC https://lhcathome.cern.ch/lhcathome/results.php?hostid=10337530 It does it after started and before started. One thing it claims is not enough memory and I watched as it happened and it had plenty of memory left and before could run all 8 cores with no problems. ID: 30013 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1432 Credit: 9,594,942 RAC: 6,465	Message 30015 - Posted: 23 Apr 2017, 7:10:14 UTC This version suffers much more from [LHC@home] task postponed 86400.000000 sec: Communication with VM Hypervisor failed. probably due to VBoxSVC too busy to respond immediately on a VBoxManage request. You have to wait a whole day (or restart BOINC). ID: 30015 · Reply Quote

BuckeyeChuck Send message Joined: 21 Jan 17 Posts: 3 Credit: 2,797,466 RAC: 0	Message 30018 - Posted: 23 Apr 2017, 12:25:51 UTC I presently have nine Theory Simulation 262.80 (vbox64) tasks that have postponed themselves with the same message given by others in this thread: Postponed: VM job unmanageable, restarting later. The run time prior to postponement varies wildly. The shortest was 14 seconds; the longest was 5 hours and 36 minutes. All of them have an estimated completion time of at least 10 hours. ID: 30018 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 853 Credit: 695,748,745 RAC: 148,231	Message 30019 - Posted: 23 Apr 2017, 12:33:16 UTC Magic how did you force the restart? Quit Boinc and restart? ID: 30019 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,859,107 RAC: 40,341	Message 30020 - Posted: 23 Apr 2017, 12:57:03 UTC - in response to Message 30018. I presently have nine Theory Simulation 262.80 (vbox64) tasks that have postponed themselves with the same message given by others in this thread: Postponed: VM job unmanageable, restarting later. You can find some information about postponed VMs in my Checklist V3 for Atlas at No. 6 and No. 16d The checklist was designed for Atlas, but regarding postponed it will help fo VMs Supporting BOINC, a great concept ! ID: 30020 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1183 Credit: 56,048,330 RAC: 57,125	Message 30023 - Posted: 23 Apr 2017, 17:54:35 UTC - in response to Message 30019. Last modified: 23 Apr 2017, 17:58:53 UTC Magic how did you force the restart? Quit Boinc and restart? Well I do a few things like going to the VB manager and telling it to start or I just pause everything and do a reboot. The one I did during my previous post was the reboot version and that worked and it restarted and finished a few hours later. Still not sure why it is saying it is a memory problem now. It has no problem running four of the X2 LHC-dev Theory tasks. Still has 2.7GB memory available and the times I watched the tasks here crash it said I still had 5GB available. ID: 30023 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 853 Credit: 695,748,745 RAC: 148,231	Message 30025 - Posted: 23 Apr 2017, 20:01:12 UTC Last modified: 23 Apr 2017, 22:25:30 UTC Thanks, same as me. I don't see any ram problems, just lack of communication. I wrote a script that restart BOINC if there is more than 50% of the tasks stuck in pending. I run it every hour to fix the pending's ID: 30025 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 853 Credit: 695,748,745 RAC: 148,231	Message 30032 - Posted: 24 Apr 2017, 19:17:35 UTC I think will just set to no new work til there is a new version. ID: 30032 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0	Message 30050 - Posted: 25 Apr 2017, 19:10:04 UTC Last modified: 25 Apr 2017, 19:22:48 UTC Another couple of issues with 262.80: VMs don't always abide by the 18 hour cut-off; I have 3 VMs at 23, 24 and 25 hours. 2 were reset a short time ago as they both had loopers (that I forgot to take details of, doh!) and have started new jobs. I'll see if they terminate after those. (nope) If not, I'll try to end them "gracefully". The 25hr one finished its last job 9 hrs ago but has given up after a few condor write failures. A reset got it a new job. 3 successful graceful task terminations. [Shutdown signal doesn't always properly get to VBox although that might my fault as I'm running Boinc 7.7.2 which hasn't been fully cleared for release yet] Often the progress % bears no relation elapsed time eg. 7hrs 39% 24hrs 6% 25hrs 65% 90 mins 8.3% 110 mins 7.4% [3.01 multis running single core appear immune to these issues although it looks like they use the same wrapper] Let's see what 262.90 has to offer. Got ...80s just like others have reported in the ...90 thread. ID: 30050 · Reply Quote

LHC@home