Message boards :
News :
Lack of CMS tasks due to a problem in WMAgent development
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
Unfortunately, I have been unable to submit new workflows to the CMS project since yesterday, and the job queues have now drained. The cause is a change introduced in the development of the CMS work-flow management system. These changes are tested first on a development system before being moved to the production system. We currently use the development system to run CMS@Home, so the change is impacting us. I'm trying to find out when a fix will be forthcoming, but until then set No New Tasks for CMS or switch to another project. I'm sorry about this. I will let you know when I am able to submit jobs again. |
Send message Joined: 27 May 20 Posts: 10 Credit: 3,252,021 RAC: 0 |
Thank you for letting us know. Keep up the good work, Ivan! |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
Thank you for letting us know. Keep up the good work, Ivan! I try, mate, I try. As you know, things are getting fraught here in the UK with the emergence of the Ο COVID variant. CERN has upgraded its alert status (to orange IIRC) so it's not certain that all workers will be on deck in the next weeks. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
You need a subscription to read this, but the gist is that it is moving fast. https://www.telegraph.co.uk/news/2021/12/09/charts-ministers-think-could-hit-million-omicron-cases-day-christmas/ Hang in there. |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
You need a subscription to read this(or NoScript) , but the gist is that it is moving fast. |
Send message Joined: 14 Jan 10 Posts: 1371 Credit: 9,130,392 RAC: 3,601 |
ivan wrote: ...Do you mean with development system "Development for LHC@home" (https://lhcathomedev.cern.ch/lhcathome-dev/). If so: there are no CMS-tasks available to test with. There are only Theory workunuts. |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
ivan wrote:...Do you mean with development system "Development for LHC@home" (https://lhcathomedev.cern.ch/lhcathome-dev/). No, that project takes its jobs from the same pool as the main LHC@Home application. The problem is with the management system cmsweb-testbed.cern.ch where projects such as WMCore (including WMAgent, WMStats, etc.) are developed before deployment to the production system cmsweb.cern.ch (if you have the right CMS/CERN credentials you can inspect those sites). The WMCore team are under some pressure to develop new systems (including transition to python3, I believe) before the next LHC data acquisition phase. They may have adopted the "agile" philosophy of "move fast and break things"... |
Send message Joined: 18 Dec 15 Posts: 1748 Credit: 115,259,856 RAC: 90,669 |
Ivan, do we have to expect that there won't be CMS tasks available for a (short) while ? |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
Ivan, do we have to expect that there won't be CMS tasks available for a (short) while ? Erich; I've just been able to submit a new batch of jobs, so the problem seems to have been fixed. Tentatively, start asking for tasks again. Please note that we are testing some new configurations in the hope that we'll be another step closer to contributing to production jobs for CMS. I'll be monitoring this closely, don't be surprised if I delete workflows that don't shoehorn into the limits I arbitrarily set for your tasks (i.e. jobs take ~2 hours CPU time, and generate no more than 100 MB of result files). |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
OK, for want of anywhere better to put it... We've been playing with a new workflow, using more modern versions of the CMS MonteCarlo software than those we have been using for the last 30(!) months. I think I've got the parameters tuned, with respect of how much CPU time each job takes, and the amount of result files returned (i.e. bandwidth usage). Please let me know if you think I've mixed up the balance, either here or at my Uni address (I think you all know who I am by now...). Oh, I've gone through a few iterations, so leave it for a day or two to let the last tweaks take hold. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,175,370 RAC: 120,198 |
Between 2021-12-16:17:00 UTC and 2021-12-17:01:00 UTC the average core time dropped from 2.5-3 h to 1-1.5 h. Within this period I didn't notice any significant CPU load changes. Bandwidth usage mainly affects the upload of subtask results. This corresponds to the core time: Shorter core time -> smaller uploads and vice versa. The number of HTTP requests per hour changed significantly - from ~30,000/h to nearly 90,000/h. Nearly all additional requests were sent to cms-frontier.openhtc.io. Although those files are rather small and don't cost much bandwidth the network connections may suffer from: - higher latency on WAN connections (compared to the LAN) - old home routers may slow down if they don't have enough resources to handle all connections concurrently The last 2 points can perfectly be covered by a local Squid proxy. To also support volunteers not running a local proxy I would suggest to use a configuration that keeps the core time around 2.5 h. |
Send message Joined: 29 Aug 05 Posts: 1048 Credit: 7,477,327 RAC: 7,692 |
Yes, I think this workflow is not especially suitable. The old one generated fewer "interesting" events per CPU-hour so its result files were relatively smaller. I may have to put this on the back-burner due to the upcoming holidays and the massive upswing in COVID cases -- I fear London is on the brink of total shut-down. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,235,381 RAC: 19,655 |
You need a subscription to read this, but the gist is that it is moving fast. Deaths is what matters, not cases. Look at the deaths graph, scroll down here: https://www.worldometers.info/coronavirus/country/uk/ Tiny amount compared with earlier. And worldwide, it's less than 1 in 1000 have died, not concerning me at all. A 1 in 1000 chance of something is more or less zero. I wouldn't bet on a horse with that chance of winning. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Deaths is what matters, not cases. Look at the deaths graph, scroll down here: I read The Telegraph for all the statistics. It gives us a couple of weeks of warning in the U.S. |
Send message Joined: 3 Jan 16 Posts: 1 Credit: 6,904 RAC: 0 |
are we still having issues with issuing work packets?? I saw a few come thru a few days ago but my computer quickly ate them up and I haven't seen anything since.....just wondering what was going on. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,175,370 RAC: 120,198 |
WMAgent is only used to distribute CMS work (No ATLAS, Theory, SixTrack). Your computer got nothing but some SixTrack tasks a few days ago but that rts queue is currently empty. |
©2024 CERN