Message boards : CMS Application : New Version 50.00
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 41853 - Posted: 9 Mar 2020, 8:17:25 UTC

This new CMS version updates the configuration of CVMFS and refreshes the cached files.
ID: 41853 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,852,795
RAC: 78,870
Message 41854 - Posted: 9 Mar 2020, 8:38:04 UTC

Server restarted?
The BOINC client downloads the old CMS_2019_03_25.vdi
ID: 41854 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 981
Credit: 34,595,899
RAC: 18,746
Message 41855 - Posted: 9 Mar 2020, 9:42:11 UTC - in response to Message 41853.  
Last modified: 9 Mar 2020, 9:55:20 UTC

We had no finished CMS-Task in -dev for Vers.50.00.
Is it possible to upgrade Condor to CentOs 7?
ID: 41855 · Report as offensive     Reply Quote
dduggan47

Send message
Joined: 1 Sep 04
Posts: 47
Credit: 4,751,479
RAC: 0
Message 41864 - Posted: 9 Mar 2020, 21:44:29 UTC

My v49 CMS tasks were consistently crashing recently, not 100% but a large majority. When I read here that v50 was available I aborted all the v49's in my queue, but they were replaced by more of the same. I then changed my preferences to exclude CMS tasks (temporarily), aborted the new V49's I'd been sent, and updated again. What I got was more CMS v49 tasks.

3 questions:

1) How long does it take for preference changes to take effect on the server?

2) When can I expect to be able to download v50 tasks to see if that helps?

3) ... or, is there something else stupid I'm doing wrong.

Thanks.
ID: 41864 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,852,795
RAC: 78,870
Message 41865 - Posted: 9 Mar 2020, 21:57:39 UTC - in response to Message 41864.  

There's still no "GO" from Ivan:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5334&postid=41826


In addition something went wrong with this version update but nonetheless it would only be the envelope.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5339&postid=41854
ID: 41865 · Report as offensive     Reply Quote
dduggan47

Send message
Joined: 1 Sep 04
Posts: 47
Credit: 4,751,479
RAC: 0
Message 41873 - Posted: 10 Mar 2020, 18:18:00 UTC - in response to Message 41865.  

There's still no "GO" from Ivan:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5334&postid=41826


In addition something went wrong with this version update but nonetheless it would only be the envelope.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5339&postid=41854


Thank you, hadn't seen that.

(The other problem (downloading CMS) was my error.)
ID: 41873 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,308,362
RAC: 4,102
Message 41932 - Posted: 17 Mar 2020, 12:20:48 UTC

any idea when CMS will be up and running again?
ID: 41932 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,308,362
RAC: 4,102
Message 41961 - Posted: 20 Mar 2020, 9:58:37 UTC - in response to Message 41932.  

On March 17, I wrote:
any idea when CMS will be up and running again?
in the past few days, the server status page showed zero tasks available, today the queue was refilled.
What does this mean? There has been no "go ahead" from Ivan so far; so I guess one would crunch CMS tasks at one's own risk only, right?
ID: 41961 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 981
Credit: 34,595,899
RAC: 18,746
Message 41962 - Posted: 20 Mar 2020, 10:08:18 UTC - in response to Message 41961.  

If Cern-IT is testing, they need also Data from Boinc-Server for CMS.
So be patient and wait.
ID: 41962 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 454
Credit: 12,309,720
RAC: 3,723
Message 41968 - Posted: 20 Mar 2020, 17:24:14 UTC - in response to Message 41961.  

I am running a CMS now, with two more in the buffer.
YMMV
ID: 41968 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 454
Credit: 12,309,720
RAC: 3,723
Message 41969 - Posted: 20 Mar 2020, 22:33:26 UTC - in response to Message 41968.  

The next ten were empty, so I am back to native ATLAS only.
ID: 41969 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,308,362
RAC: 4,102
Message 41971 - Posted: 21 Mar 2020, 6:17:05 UTC

thanks, guys, for the information.

So no CMS at this point of time :-(
ID: 41971 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 92
Credit: 37,970,693
RAC: 1,582
Message 43128 - Posted: 30 Jul 2020, 0:42:28 UTC

Is there a native CMS now?
ID: 43128 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 454
Credit: 12,309,720
RAC: 3,723
Message 43129 - Posted: 30 Jul 2020, 1:09:50 UTC - in response to Message 43128.  

Is there a native CMS now?

That is asked from time to time. The answer is no, because it is too complicated.
Apparently CMS covers a variety of experimental groups, all doing their own thing.
It is hard enough to get it to work with VirtualBox, which is relatively easier since it packages them all up the same way.

Beyond that, a real expert will need to answer. But I would hope they they could find a way too, preferably with "runc", which is what they use for Theory.
It avoids the need for singularity, and just requires CVMFS. .
ID: 43129 · Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 13,959,975
RAC: 14,921
Message 43311 - Posted: 6 Sep 2020, 9:41:48 UTC

I've crunched a few dozen CMS version 50.00 WUs now and I've noticed they consume a lot of network bandwidth. In fact, when 15+ of them run at the same time, they completely use up 5 Mbit/s upload limit of my home connection (download is fine). What is the total download and upload size each WU generates during the entire run?
ID: 43311 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,852,795
RAC: 78,870
Message 43312 - Posted: 6 Sep 2020, 10:25:42 UTC - in response to Message 43311.  

I've crunched a few dozen CMS version 50.00 WUs now and I've noticed they consume a lot of network bandwidth. In fact, when 15+ of them run at the same time, they completely use up 5 Mbit/s upload limit of my home connection (download is fine). What is the total download and upload size each WU generates during the entire run?

Each new CMS task downloads about 200MB.
Most of that can be served from a local squid proxy:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5473

A typical CMS subtask writes a result file of roughly 110 MB within 1-3 h (average 2 h).
This file requires about 3 min to be uploaded on your line (5 Mbit/s).
Based on the average 15 concurrently running CMS tasks require 38% of your total upload capacity.
A proxy doesn't help to reduce those uploads.
ID: 43312 · Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 13,959,975
RAC: 14,921
Message 43313 - Posted: 6 Sep 2020, 11:09:57 UTC - in response to Message 43312.  
Last modified: 6 Sep 2020, 11:29:46 UTC

A typical CMS subtask writes a result file of roughly 110 MB within 1-3 h (average 2 h).
This file requires about 3 min to be uploaded on your line (5 Mbit/s).
Based on the average 15 concurrently running CMS tasks require 38% of your total upload capacity.


Thanks for the info. It's not so rosy in practice though, it seems the crunching stalls until the result file is completely uploaded. And when 15+ WUs upload at the same time, each upload speed fluctuates aroud 0.3 Mbit/s (or 30 kB/s, welcome to the dialup era), so the upload takes over hour. In the meantime, other WUs complete another result and try to upload it. The end result is that only 3 or 4 WUs of the 15 actually crunch, the rest are waiting for upload. Or at least it will become stuck in this vicious cycle if some other program uses up the upload bandwidth for 20 minutes or so. I need to limit the number of CMS tasks via app_config.xml. Are you sure the CMS result data is only 55 MB/hour on average?
ID: 43313 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,852,795
RAC: 78,870
Message 43314 - Posted: 6 Sep 2020, 16:25:24 UTC - in response to Message 43313.  

it seems the crunching stalls until the result file is completely uploaded.

Right.
The CPU remains idle until the upload (~110 MB) has finished and a new job (3-4 MB) has been downloaded.
This is how CMS tasks always work.


And when 15+ WUs upload at the same time, each upload speed fluctuates aroud 0.3 Mbit/s

Right.
This happens all the time since the internet line is a shared medium and each active connection gets a fraction of the total bandwidth.
Fortunately you normally don't notice it since most uploads are much smaller than the CMS result files.
The 38% are an average value. While uploads are in progress - even just 1 - you should see a 100% bandwidth usage.


... the upload takes over hour. In the meantime, other WUs complete another result and try to upload it. The end result is that only 3 or 4 WUs of the 15 actually crunch.

Right.
Very likely that this happens.


... the rest are waiting for upload

No.
They are not waiting. Their uploads are slow but in progress.


Are you sure the CMS result data is only 55 MB/hour on average?

That's an average.
Each job result is around 110 MB (+- a few MB).
The fastest computers require about 1 h to complete a job (=subtask), slower ones may need up to 3 h.
ID: 43314 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,575,006
RAC: 2,272
Message 43321 - Posted: 8 Sep 2020, 11:40:17 UTC

You can see some data on job timings, etc., in the job graphs. I grabbed graphs that I felt were most useful, but you can play around with the parameters if you like (in particular, if you click on the back-arrow within a plot, you can see a whole lot of other plots that you can view in full by clicking on the plot title and selecting "View" on the drop-down menu). Note that not all of these graphs are properly populated, CMS@Home is not a high priority for the monitoring crew.
My initial aim when this all started was to run jobs (or sub-tasks as some call them) that ran for 1-2 hours and returned up to 100 MB of results. This was mainly based on my connection at the time, which was 5-6 Mbps download and 1 Mbps upload, and the assumption that most people would only run one task at a time, or at least adjust the number of tasks to suit their connectivity. There has always been the problem of people being over-enthusiastic about their contribution and running into the sort of problem being discussed here. We also have to choose our tasks carefully, I could easily send you jobs that would tax a 100 Mbps link!
ID: 43321 · Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 13,959,975
RAC: 14,921
Message 43328 - Posted: 9 Sep 2020, 23:14:31 UTC - in response to Message 43321.  

Unfortunately, all Virtualbox apps have always been quite opaque when it comes to actual memory and bandwidth requirements. Boinc Manager is unable to display them and you can't google this information (I tried before I asked here). The VM console doesn't display what the workunit actually does and the Virtualbox Manager has no graphs or statistics, either. So it's very easy to become "over-enthusiastic" that way, because the average user is left to guesswork with Windows task manager or similar tools. :-/
ID: 43328 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : New Version 50.00


©2020 CERN