Message boards : ATLAS application : ATLAS vbox version 2.00
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 40094 - Posted: 9 Oct 2019, 8:40:01 UTC
Last modified: 9 Oct 2019, 12:34:09 UTC

After two and a half years of version 1.01 we are happy to announce a new version of ATLAS vbox!

This version updates the image from SLC6 to CentOS 7, and the bootstrapping scripts have been improved to give faster startup times and better logging.

The image is a bit larger than the old one, 2.7GB (1.1GB compressed download) and therefore the disk space limits have been increased to 10GB per task.

This new version has been extensively tested on LHC-dev, many thanks to all the volunteers who helped there! However since this is such a major update we have made it a beta version, so you will only get it if you have selected "Run test applications" in your preferences.

Please try it out and give your feedback!

EDIT: the server was not distributing any WU with the new version so I removed the beta flag. Now everyone should get it.
ID: 40094 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1413
Credit: 9,435,474
RAC: 8,120
Message 40095 - Posted: 9 Oct 2019, 9:24:49 UTC - in response to Message 40094.  

I suppose this new versions (native and vbox) will be sent, when all older unsent WU's have been distributed - 3264 tasks at the moment.
ID: 40095 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 40101 - Posted: 9 Oct 2019, 12:21:51 UTC - in response to Message 40095.  

No, it should be effective immediately. The decision on which version to use is taken when the client asks the server for new tasks. However I see no WU have been sent which use the new versions yet... BOINC's choice of app version has always been a mystery to me and usually I deprecate previous versions to force new versions to be used. But first I will take the new versions out of beta to see if it helps.
ID: 40101 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 831
Credit: 688,871,932
RAC: 126,472
Message 40102 - Posted: 9 Oct 2019, 18:36:49 UTC - in response to Message 40101.  

I got one, the task size is still kind of messed up like the 1.01. I have a 8day task at 43GFLOPS and the new one that 99.999% in 5hr or less, they can be both 43GFLOPS on the same computer.
ID: 40102 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2521
Credit: 252,581,430
RAC: 137,194
Message 40103 - Posted: 9 Oct 2019, 19:28:24 UTC

On my computers all ATLAS tasks fail since x86_64-centos7.img is used as singularity image.
A few tasks running v2.72 native are running fine with image x86_64-slc6.img.
ID: 40103 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 40104 - Posted: 9 Oct 2019, 20:15:31 UTC - in response to Message 40103.  

On my computers all ATLAS tasks fail since x86_64-centos7.img is used as singularity image.
A few tasks running v2.72 native are running fine with image x86_64-slc6.img.


Not sure I understand - 2.72 uses the centos7 image and all previous versions used the slc6 one. Could you give some examples? You have a lot of hosts and it's a lot of clicking to find the ones which run ATLAS :)

And maybe this would be better discussed on the thread for the new native version
ID: 40104 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2521
Credit: 252,581,430
RAC: 137,194
Message 40111 - Posted: 10 Oct 2019, 7:38:35 UTC - in response to Message 40104.  

And maybe this would be better discussed on the thread for the new native version

You are right.
Sorry.

Meanwhile the problem seems to be solved:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5161&postid=40110
ID: 40111 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 831
Credit: 688,871,932
RAC: 126,472
Message 40119 - Posted: 11 Oct 2019, 5:36:23 UTC

Do you have an idea when they might finish?

So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so.

Do you have a way to edit the FLOPS estimate?
ID: 40119 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 205
Credit: 1,564,905
RAC: 1,765
Message 40120 - Posted: 11 Oct 2019, 8:10:32 UTC - in response to Message 40119.  

Do you have an idea when they might finish?
So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so.


Same here.
Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching.
Is it normal?
ID: 40120 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2230
Credit: 173,863,695
RAC: 17,252
Message 40121 - Posted: 11 Oct 2019, 8:51:55 UTC - in response to Message 40120.  

Have 1.01 and 2.00 running. Up to 250k seconds with 5 or 6 CPU's.
Up to 20 Minutes per collision (200 Collisions).
ID: 40121 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2521
Credit: 252,581,430
RAC: 137,194
Message 40123 - Posted: 11 Oct 2019, 10:00:46 UTC

Toby Broom wrote:
Do you have an idea when they might finish?

So far they reached 100% in 1-2hrs but the FLOPS estimate must be way wrong since they are still going after a day or so.

boboviz wrote:
Same here.
Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching.
Is it normal?

This is the usual behaviour whenever a new app version is introduced.

BOINC uses 2 main factors to calculate estimated runtimes (as well as credits):
- estimated GFLOPS for a task
- peak GFLOPs of a computer

Although it is a BOINC recommendation to estimate the task's GFLOPS as accurate as possible before the server sends it to a client, ATLAS always uses a fixed value.

Based on that value together with the returned runtime/CPU-time the server calculates an average processing rate for each host and stores it in the DB.
The server returns that value to the client when fresh work is sent.

A new app version usually starts with the benchmarked processing rates and needs some days or even weeks to get enough returned results.

As an example you may compare the average processing rates for ATLAS 1.01/2.00 on this host:
https://lhcathome.cern.ch/lhcathome/host_app_versions.php?hostid=10567798


Toby Broom wrote:
Do you have a way to edit the FLOPS estimate?

Wouldn't make much sense:
- The adaptive algorithm solves the issue
- ATLAS events have variable runtimes even inside a task
It might be possible to estimate an average for a given ATLAS batch but there are usually a couple of batches in the queue and tasks are sent out in random order.
ID: 40123 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 831
Credit: 688,871,932
RAC: 126,472
Message 40128 - Posted: 11 Oct 2019, 16:01:39 UTC - in response to Message 40123.  

- ATLAS events have variable runtimes even inside a task

So does SixiTrack but this doesn't not leave tasks running for 6days at 99.999, so there is a possibility to configure the tasks better.
ID: 40128 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 205
Credit: 1,564,905
RAC: 1,765
Message 40132 - Posted: 12 Oct 2019, 8:53:19 UTC - in response to Message 40120.  

Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching.


Finished!! (after 34hs)
ID: 40132 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,425,566
RAC: 0
Message 40141 - Posted: 13 Oct 2019, 5:03:07 UTC - in response to Message 40094.  
Last modified: 13 Oct 2019, 5:08:39 UTC

Are we supposed to get any info in the second terminal within the running virtual machine? Alt + F2?
I see TOP related info in Alt + F3 but just login screens on the first and second terminal screens while I have activity on all eight assigned processors within the VM (athena.py)

Nevermind, I started to see info. It just took a while.
ID: 40141 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1169
Credit: 54,428,146
RAC: 61,850
Message 40143 - Posted: 13 Oct 2019, 8:07:10 UTC - in response to Message 40132.  

Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching.


Finished!! (after 34hs)


And 3,523.68 Credits
ID: 40143 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2521
Credit: 252,581,430
RAC: 137,194
Message 40144 - Posted: 13 Oct 2019, 8:58:11 UTC - in response to Message 40132.  

Wu arrived quickly to 99%, then slowed down and now, after 22hs is at 100%, but still crunching.


Finished!! (after 34hs)

The task finished and got credits - that's BOINC's point of view.
But you may be aware that from the scientific point of view the task didn't succeed:
2019-10-12 05:27:54 (12056): Guest Log: No HITS file was produced


Your logfile shows that the task started/paused several times and couldn't successfully write it's snapshot:
2019-10-10 07:23:49 (13064): Stopping VM.
2019-10-10 07:23:58 (13064): Error in stop VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate
Output:
0%...10%...20%...30%...40%...
2019-10-10 07:23:58 (13064): VM did not stop when requested.
2019-10-10 07:23:58 (13064): VM was successfully terminated.
.
.
.
2019-10-10 18:46:27 (6124): Stopping VM.
2019-10-10 18:46:29 (6124): Error in stop VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate
Output:
0%...10%...20%...
2019-10-10 18:46:29 (6124): VM did not stop when requested.
2019-10-10 18:46:29 (6124): VM was NOT successfully terminated.
.
.
.
2019-10-11 07:51:09 (3136): Stopping VM.
2019-10-11 07:51:19 (3136): Error in stop VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_0c99027d1c3c10c0" savestate
Output:
0%...10%...20%...30%...40%...
2019-10-11 07:51:19 (3136): VM did not stop when requested.
2019-10-11 07:51:19 (3136): VM was NOT successfully terminated.



In addition the following line shows that vbox extensions are not installed on your host:
2019-10-10 07:17:41 (13064): Required extension pack not installed, remote desktop not enabled.

If you are interested in a better task monitoring those extensions must be installed.
See Yeti's checklist:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359
ID: 40144 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 831
Credit: 688,871,932
RAC: 126,472
Message 40145 - Posted: 13 Oct 2019, 14:37:28 UTC

I have:

2 at 67hrs
4 at 85hrs
1 at 97hrs
7 at 60hrs

Sill have 4 days until the deadline, hopefully they can finish.

looks like about 35min per event for my computer there at 130-160
ID: 40145 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 205
Credit: 1,564,905
RAC: 1,765
Message 40146 - Posted: 13 Oct 2019, 17:23:51 UTC - in response to Message 40144.  
Last modified: 13 Oct 2019, 17:25:55 UTC

The task finished and got credits - that's BOINC's point of view.
But you may be aware that from the scientific point of view the task didn't succeed:
2019-10-12 05:27:54 (12056): Guest Log: No HITS file was produced


Nooo, i partecipate for science not for credits...

Your logfile shows that the task started/paused several times and couldn't successfully write it's snapshot

I crunch with my notebook, so i turn off it sometimes.
But why not snapshot if we use virtualbox??
ID: 40146 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 205
Credit: 1,564,905
RAC: 1,765
Message 40147 - Posted: 13 Oct 2019, 17:37:06 UTC - in response to Message 40144.  

If you are interested in a better task monitoring those extensions must be installed.
See Yeti's checklist:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359


Ok, i've installed the latest version of VirtualBox and the latest extensions.
But this wu still use 0% of cpu.
The precedent wu used 50% of cpu
ID: 40147 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 40148 - Posted: 14 Oct 2019, 17:19:33 UTC
Last modified: 14 Oct 2019, 17:22:42 UTC

I ended my first 2.0 task in 4 h 44 min and 5 s, with more than 1d, 10 hours, 10 min of CPU time, but no HITS file was produced.
Tullio
ID: 40148 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : ATLAS application : ATLAS vbox version 2.00


©2024 CERN