ATLAS vbox version 2.00

Author	Message
David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 40094 - Posted: 9 Oct 2019, 8:40:01 UTC Last modified: 9 Oct 2019, 12:34:09 UTC After two and a half years of version 1.01 we are happy to announce a new version of ATLAS vbox! This version updates the image from SLC6 to CentOS 7, and the bootstrapping scripts have been improved to give faster startup times and better logging. The image is a bit larger than the old one, 2.7GB (1.1GB compressed download) and therefore the disk space limits have been increased to 10GB per task. This new version has been extensively tested on LHC-dev, many thanks to all the volunteers who helped there! However since this is such a major update we have made it a beta version, so you will only get it if you have selected "Run test applications" in your preferences. Please try it out and give your feedback! EDIT: the server was not distributing any WU with the new version so I removed the beta flag. Now everyone should get it. ID: 40094 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,851,536 RAC: 3,668	Message 40095 - Posted: 9 Oct 2019, 9:24:49 UTC - in response to Message 40094. I suppose this new versions (native and vbox) will be sent, when all older unsent WU's have been distributed - 3264 tasks at the moment. ID: 40095 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 40101 - Posted: 9 Oct 2019, 12:21:51 UTC - in response to Message 40095. No, it should be effective immediately. The decision on which version to use is taken when the client asks the server for new tasks. However I see no WU have been sent which use the new versions yet... BOINC's choice of app version has always been a mystery to me and usually I deprecate previous versions to force new versions to be used. But first I will take the new versions out of beta to see if it helps. ID: 40101 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 878 Credit: 744,696,290 RAC: 303,952	Message 40102 - Posted: 9 Oct 2019, 18:36:49 UTC - in response to Message 40101. I got one, the task size is still kind of messed up like the 1.01. I have a 8day task at 43GFLOPS and the new one that 99.999% in 5hr or less, they can be both 43GFLOPS on the same computer. ID: 40102 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,750,404 RAC: 84,966	Message 40103 - Posted: 9 Oct 2019, 19:28:24 UTC On my computers all ATLAS tasks fail since x86_64-centos7.img is used as singularity image. A few tasks running v2.72 native are running fine with image x86_64-slc6.img. ID: 40103 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 40104 - Posted: 9 Oct 2019, 20:15:31 UTC - in response to Message 40103. On my computers all ATLAS tasks fail since x86_64-centos7.img is used as singularity image. A few tasks running v2.72 native are running fine with image x86_64-slc6.img. Not sure I understand - 2.72 uses the centos7 image and all previous versions used the slc6 one. Could you give some examples? You have a lot of hosts and it's a lot of clicking to find the ones which run ATLAS :) And maybe this would be better discussed on the thread for the new native version ID: 40104 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,750,404 RAC: 84,966	Message 40111 - Posted: 10 Oct 2019, 7:38:35 UTC - in response to Message 40104. And maybe this would be better discussed on the thread for the new native version You are right. Sorry. Meanwhile the problem seems to be solved: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5161&postid=40110 ID: 40111 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 878 Credit: 744,696,290 RAC: 303,952	Message 40119 - Posted: 11 Oct 2019, 5:36:23 UTC Do you have an idea when they might finish? So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so. Do you have a way to edit the FLOPS estimate? ID: 40119 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,845,074 RAC: 10,598	Message 40120 - Posted: 11 Oct 2019, 8:10:32 UTC - in response to Message 40119. Do you have an idea when they might finish? So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so. Same here. Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching. Is it normal? ID: 40120 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2276 Credit: 178,139,252 RAC: 113,022	Message 40121 - Posted: 11 Oct 2019, 8:51:55 UTC - in response to Message 40120. Have 1.01 and 2.00 running. Up to 250k seconds with 5 or 6 CPU's. Up to 20 Minutes per collision (200 Collisions). ID: 40121 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,750,404 RAC: 84,966	Message 40123 - Posted: 11 Oct 2019, 10:00:46 UTC Toby Broom wrote: Do you have an idea when they might finish? So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so. boboviz wrote: Same here. Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching. Is it normal? This is the usual behaviour whenever a new app version is introduced. BOINC uses 2 main factors to calculate estimated runtimes (as well as credits): - estimated GFLOPS for a task - peak GFLOPs of a computer Although it is a BOINC recommendation to estimate the task's GFLOPS as accurate as possible before the server sends it to a client, ATLAS always uses a fixed value. Based on that value together with the returned runtime/CPU-time the server calculates an average processing rate for each host and stores it in the DB. The server returns that value to the client when fresh work is sent. A new app version usually starts with the benchmarked processing rates and needs some days or even weeks to get enough returned results. As an example you may compare the average processing rates for ATLAS 1.01/2.00 on this host: https://lhcathome.cern.ch/lhcathome/host_app_versions.php?hostid=10567798 Toby Broom wrote: Do you have a way to edit the FLOPS estimate? Wouldn't make much sense: - The adaptive algorithm solves the issue - ATLAS events have variable runtimes even inside a task It might be possible to estimate an average for a given ATLAS batch but there are usually a couple of batches in the queue and tasks are sent out in random order. ID: 40123 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 878 Credit: 744,696,290 RAC: 303,952	Message 40128 - Posted: 11 Oct 2019, 16:01:39 UTC - in response to Message 40123. - ATLAS events have variable runtimes even inside a task So does SixiTrack but this doesn't not leave tasks running for 6days at 99.999, so there is a possibility to configure the tasks better. ID: 40128 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,845,074 RAC: 10,598	Message 40132 - Posted: 12 Oct 2019, 8:53:19 UTC - in response to Message 40120. Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching. Finished!! (after 34hs) ID: 40132 · Reply Quote

Jonathan Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0	Message 40141 - Posted: 13 Oct 2019, 5:03:07 UTC - in response to Message 40094. Last modified: 13 Oct 2019, 5:08:39 UTC Are we supposed to get any info in the second terminal within the running virtual machine? Alt + F2? I see TOP related info in Alt + F3 but just login screens on the first and second terminal screens while I have activity on all eight assigned processors within the VM (athena.py) Nevermind, I started to see info. It just took a while. ID: 40141 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,553,747 RAC: 110,926	Message 40143 - Posted: 13 Oct 2019, 8:07:10 UTC - in response to Message 40132. Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching. Finished!! (after 34hs) And 3,523.68 Credits ID: 40143 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,750,404 RAC: 84,966	Message 40144 - Posted: 13 Oct 2019, 8:58:11 UTC - in response to Message 40132. Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching. Finished!! (after 34hs) The task finished and got credits - that's BOINC's point of view. But you may be aware that from the scientific point of view the task didn't succeed: 2019-10-12 05:27:54 (12056): Guest Log: No HITS file was produced Your logfile shows that the task started/paused several times and couldn't successfully write it's snapshot: 2019-10-10 07:23:49 (13064): Stopping VM. 2019-10-10 07:23:58 (13064): Error in stop VM for VM: -108 Command: VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate Output: 0%...10%...20%...30%...40%... 2019-10-10 07:23:58 (13064): VM did not stop when requested. 2019-10-10 07:23:58 (13064): VM was successfully terminated. . . . 2019-10-10 18:46:27 (6124): Stopping VM. 2019-10-10 18:46:29 (6124): Error in stop VM for VM: -108 Command: VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate Output: 0%...10%...20%... 2019-10-10 18:46:29 (6124): VM did not stop when requested. 2019-10-10 18:46:29 (6124): VM was NOT successfully terminated. . . . 2019-10-11 07:51:09 (3136): Stopping VM. 2019-10-11 07:51:19 (3136): Error in stop VM for VM: -108 Command: VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate Output: 0%...10%...20%...30%...40%... 2019-10-11 07:51:19 (3136): VM did not stop when requested. 2019-10-11 07:51:19 (3136): VM was NOT successfully terminated. In addition the following line shows that vbox extensions are not installed on your host: 2019-10-10 07:17:41 (13064): Required extension pack not installed, remote desktop not enabled. If you are interested in a better task monitoring those extensions must be installed. See Yeti's checklist: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359 ID: 40144 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 878 Credit: 744,696,290 RAC: 303,952	Message 40145 - Posted: 13 Oct 2019, 14:37:28 UTC I have: 2 at 67hrs 4 at 85hrs 1 at 97hrs 7 at 60hrs Sill have 4 days until the deadline, hopefully they can finish. looks like about 35min per event for my computer there at 130-160 ID: 40145 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,845,074 RAC: 10,598	Message 40146 - Posted: 13 Oct 2019, 17:23:51 UTC - in response to Message 40144. Last modified: 13 Oct 2019, 17:25:55 UTC The task finished and got credits - that's BOINC's point of view. But you may be aware that from the scientific point of view the task didn't succeed: 2019-10-12 05:27:54 (12056): Guest Log: No HITS file was produced Nooo, i partecipate for science not for credits... Your logfile shows that the task started/paused several times and couldn't successfully write it's snapshot I crunch with my notebook, so i turn off it sometimes. But why not snapshot if we use virtualbox?? ID: 40146 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,845,074 RAC: 10,598	Message 40147 - Posted: 13 Oct 2019, 17:37:06 UTC - in response to Message 40144. If you are interested in a better task monitoring those extensions must be installed. See Yeti's checklist: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359 Ok, i've installed the latest version of VirtualBox and the latest extensions. But this wu still use 0% of cpu. The precedent wu used 50% of cpu ID: 40147 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 40148 - Posted: 14 Oct 2019, 17:19:33 UTC Last modified: 14 Oct 2019, 17:22:42 UTC I ended my first 2.0 task in 4 h 44 min and 5 s, with more than 1d, 10 hours, 10 min of CPU time, but no HITS file was produced. Tullio ID: 40148 · Reply Quote

LHC@home