Message boards : News : Lack of CMS tasks due to a problem in WMAgent development
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45844 - Posted: 9 Dec 2021, 15:15:02 UTC

Unfortunately, I have been unable to submit new workflows to the CMS project since yesterday, and the job queues have now drained.
The cause is a change introduced in the development of the CMS work-flow management system. These changes are tested first on a development system before being moved to the production system. We currently use the development system to run CMS@Home, so the change is impacting us.
I'm trying to find out when a fix will be forthcoming, but until then set No New Tasks for CMS or switch to another project.
I'm sorry about this. I will let you know when I am able to submit jobs again.
ID: 45844 · Report as offensive     Reply Quote
Sayling Low

Send message
Joined: 27 May 20
Posts: 9
Credit: 2,697,988
RAC: 5,788
Message 45847 - Posted: 9 Dec 2021, 20:12:19 UTC - in response to Message 45844.  

Thank you for letting us know. Keep up the good work, Ivan!
ID: 45847 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45848 - Posted: 9 Dec 2021, 23:38:08 UTC - in response to Message 45847.  

Thank you for letting us know. Keep up the good work, Ivan!

I try, mate, I try. As you know, things are getting fraught here in the UK with the emergence of the Ο COVID variant. CERN has upgraded its alert status (to orange IIRC) so it's not certain that all workers will be on deck in the next weeks.
ID: 45848 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 590
Credit: 21,873,661
RAC: 577
Message 45849 - Posted: 10 Dec 2021, 0:04:02 UTC - in response to Message 45848.  

You need a subscription to read this, but the gist is that it is moving fast.
https://www.telegraph.co.uk/news/2021/12/09/charts-ministers-think-could-hit-million-omicron-cases-day-christmas/
Hang in there.
ID: 45849 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45850 - Posted: 10 Dec 2021, 9:17:20 UTC - in response to Message 45849.  

You need a subscription to read this
(or NoScript)
, but the gist is that it is moving fast.
https://www.telegraph.co.uk/news/2021/12/09/charts-ministers-think-could-hit-million-omicron-cases-day-christmas/
Hang in there.


ID: 45850 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,874,147
RAC: 1,061
Message 45851 - Posted: 10 Dec 2021, 12:25:20 UTC - in response to Message 45844.  

ivan wrote:
...
We currently use the development system to run CMS@Home, so the change is impacting us.
...
Do you mean with development system "Development for LHC@home" (https://lhcathomedev.cern.ch/lhcathome-dev/).
If so: there are no CMS-tasks available to test with. There are only Theory workunuts.
ID: 45851 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45852 - Posted: 10 Dec 2021, 20:30:13 UTC - in response to Message 45851.  

ivan wrote:
...
We currently use the development system to run CMS@Home, so the change is impacting us.
...
Do you mean with development system "Development for LHC@home" (https://lhcathomedev.cern.ch/lhcathome-dev/).
If so: there are no CMS-tasks available to test with. There are only Theory workunuts.

No, that project takes its jobs from the same pool as the main LHC@Home application. The problem is with the management system cmsweb-testbed.cern.ch where projects such as WMCore (including WMAgent, WMStats, etc.) are developed before deployment to the production system cmsweb.cern.ch (if you have the right CMS/CERN credentials you can inspect those sites). The WMCore team are under some pressure to develop new systems (including transition to python3, I believe) before the next LHC data acquisition phase. They may have adopted the "agile" philosophy of "move fast and break things"...
ID: 45852 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1514
Credit: 43,989,833
RAC: 44,637
Message 45854 - Posted: 11 Dec 2021, 14:24:03 UTC

Ivan, do we have to expect that there won't be CMS tasks available for a (short) while ?
ID: 45854 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45864 - Posted: 14 Dec 2021, 19:29:26 UTC - in response to Message 45854.  

Ivan, do we have to expect that there won't be CMS tasks available for a (short) while ?

Erich; I've just been able to submit a new batch of jobs, so the problem seems to have been fixed. Tentatively, start asking for tasks again.
Please note that we are testing some new configurations in the hope that we'll be another step closer to contributing to production jobs for CMS. I'll be monitoring this closely, don't be surprised if I delete workflows that don't shoehorn into the limits I arbitrarily set for your tasks (i.e. jobs take ~2 hours CPU time, and generate no more than 100 MB of result files).
ID: 45864 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45878 - Posted: 16 Dec 2021, 22:11:47 UTC - in response to Message 45864.  

OK, for want of anywhere better to put it...
We've been playing with a new workflow, using more modern versions of the CMS MonteCarlo software than those we have been using for the last 30(!) months. I think I've got the parameters tuned, with respect of how much CPU time each job takes, and the amount of result files returned (i.e. bandwidth usage). Please let me know if you think I've mixed up the balance, either here or at my Uni address (I think you all know who I am by now...). Oh, I've gone through a few iterations, so leave it for a day or two to let the last tweaks take hold.
ID: 45878 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,251,012
RAC: 92,622
Message 45884 - Posted: 17 Dec 2021, 7:39:46 UTC - in response to Message 45878.  

Between 2021-12-16:17:00 UTC and 2021-12-17:01:00 UTC the average core time dropped from 2.5-3 h to 1-1.5 h.
Within this period I didn't notice any significant CPU load changes.

Bandwidth usage mainly affects the upload of subtask results.
This corresponds to the core time:
Shorter core time -> smaller uploads and vice versa.

The number of HTTP requests per hour changed significantly - from ~30,000/h to nearly 90,000/h.
Nearly all additional requests were sent to cms-frontier.openhtc.io.
Although those files are rather small and don't cost much bandwidth the network connections may suffer from:
- higher latency on WAN connections (compared to the LAN)
- old home routers may slow down if they don't have enough resources to handle all connections concurrently


The last 2 points can perfectly be covered by a local Squid proxy.
To also support volunteers not running a local proxy I would suggest to use a configuration that keeps the core time around 2.5 h.
ID: 45884 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 879
Credit: 5,850,679
RAC: 484
Message 45886 - Posted: 17 Dec 2021, 10:47:32 UTC - in response to Message 45884.  

Yes, I think this workflow is not especially suitable. The old one generated fewer "interesting" events per CPU-hour so its result files were relatively smaller. I may have to put this on the back-burner due to the upcoming holidays and the massive upswing in COVID cases -- I fear London is on the brink of total shut-down.
ID: 45886 · Report as offensive     Reply Quote
Peter Hucker of the Scottish B...

Send message
Joined: 12 Aug 06
Posts: 294
Credit: 2,002,632
RAC: 365
Message 45928 - Posted: 22 Dec 2021, 11:37:55 UTC - in response to Message 45849.  
Last modified: 22 Dec 2021, 11:38:50 UTC

You need a subscription to read this, but the gist is that it is moving fast.
https://www.telegraph.co.uk/news/2021/12/09/charts-ministers-think-could-hit-million-omicron-cases-day-christmas/
Hang in there.

Deaths is what matters, not cases. Look at the deaths graph, scroll down here:
https://www.worldometers.info/coronavirus/country/uk/
Tiny amount compared with earlier.
And worldwide, it's less than 1 in 1000 have died, not concerning me at all. A 1 in 1000 chance of something is more or less zero. I wouldn't bet on a horse with that chance of winning.
ID: 45928 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 590
Credit: 21,873,661
RAC: 577
Message 45932 - Posted: 22 Dec 2021, 15:32:45 UTC - in response to Message 45928.  

Deaths is what matters, not cases. Look at the deaths graph, scroll down here:

I read The Telegraph for all the statistics. It gives us a couple of weeks of warning in the U.S.
ID: 45932 · Report as offensive     Reply Quote
gscot_000

Send message
Joined: 3 Jan 16
Posts: 1
Credit: 6,904
RAC: 0
Message 46712 - Posted: 2 May 2022, 12:26:16 UTC

are we still having issues with issuing work packets?? I saw a few come thru a few days ago but my computer quickly ate them up and I haven't seen anything since.....just wondering what was going on.
ID: 46712 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,251,012
RAC: 92,622
Message 46713 - Posted: 2 May 2022, 12:45:30 UTC - in response to Message 46712.  

WMAgent is only used to distribute CMS work (No ATLAS, Theory, SixTrack).
Your computer got nothing but some SixTrack tasks a few days ago but that rts queue is currently empty.
ID: 46713 · Report as offensive     Reply Quote

Message boards : News : Lack of CMS tasks due to a problem in WMAgent development


©2022 CERN