Message boards : CMS Application : Jobs draining for a WMAgent upgrade
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34381 - Posted: 14 Feb 2018, 13:30:07 UTC
Last modified: 15 Feb 2018, 0:49:01 UTC

We have an upgrade pending for the WMAgent which controls our job batches. Therefore I have to drain out the job queue. I will not submit any new batches for now. The current batch should drain late Sunday or on Monday; I'll try to keep an eye on the queue and let you know when to set No New Tasks, but if you see no jobs available stop your tasks on your own initiative. I will of course announce when new jobs are available again.
ID: 34381 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,580,185
RAC: 100,158
Message 34384 - Posted: 15 Feb 2018, 6:03:23 UTC - in response to Message 34381.  

Thanks for the announcement.
ID: 34384 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 110
Credit: 6,712,401
RAC: 997
Message 34385 - Posted: 15 Feb 2018, 9:52:12 UTC - in response to Message 34381.  
Last modified: 15 Feb 2018, 10:01:41 UTC

Thanks for the warning, Ivan. It's not easy for those who run different combinations of LHC jobs on different hosts to stop only CMS, so will this result in the BOINC server just not sending out CMS jobs (Laurence's fix) ?
ID: 34385 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34386 - Posted: 15 Feb 2018, 14:25:14 UTC - in response to Message 34385.  

Thanks for the warning, Ivan. It's not easy for those who run different combinations of LHC jobs on different hosts to stop only CMS, so will this result in the BOINC server just not sending out CMS jobs (Laurence's fix) ?

I hope so, but I'll remind him just in case.
ID: 34386 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 110
Credit: 6,712,401
RAC: 997
Message 34387 - Posted: 15 Feb 2018, 19:34:33 UTC - in response to Message 34386.  

Thanks for the warning, Ivan. It's not easy for those who run different combinations of LHC jobs on different hosts to stop only CMS, so will this result in the BOINC server just not sending out CMS jobs (Laurence's fix) ?

I hope so, but I'll remind him just in case.

OK, many thanks.
ID: 34387 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34388 - Posted: 15 Feb 2018, 23:51:44 UTC

At the moment we seem to be on target to drain out late on Saturday. I'll check out the situation around noon that day, and maybe submit a smaller batch of jobs to keep us running until Monday to fill in the gap before Alan can do the upgrade. It's a bit fluid; I'll keep you informed.
ID: 34388 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34393 - Posted: 17 Feb 2018, 13:08:03 UTC - in response to Message 34388.  

We were starting to run out of jobs this morning, so I submitted a small batch that should tide us over until late tomorrow.
ID: 34393 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34397 - Posted: 18 Feb 2018, 11:23:05 UTC - in response to Message 34393.  

The queue is draining again, so we'll be out of jobs in another couple of hours. I'll not submit any more to make sure we mop up as much of the"tail" before tomorrow morning.
ID: 34397 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 44
Credit: 3,801,950
RAC: 0
Message 34400 - Posted: 19 Feb 2018, 0:17:30 UTC

Why are tasks still being sent out if there is no work to do? Nothing but 207 (0x000000CF) EXIT_NO_SUB_TASKS for 8 hours.
ID: 34400 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34409 - Posted: 19 Feb 2018, 10:51:11 UTC - in response to Message 34400.  

Why are tasks still being sent out if there is no work to do? Nothing but 207 (0x000000CF) EXIT_NO_SUB_TASKS for 8 hours.

Looks like the empty-queue detection didn't work as expected and tasks kept being created.
ID: 34409 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34423 - Posted: 19 Feb 2018, 22:39:40 UTC

Well, the update has been completed, late into the night (Thanks, Alan!), and I've submitted a small batch of jobs. However, I don't think that volunteers machines are picking them up; I suspect that the CMS task creator didn't start up automagically when jobs became available. Or, there's another problem that will require manual intervention. I'll go to bed soon, and hope things straighten themselves out in the morning.
ID: 34423 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34424 - Posted: 20 Feb 2018, 8:25:50 UTC - in response to Message 34423.  

Task creation is on again. I've just received my quota on all my PCs. There will be some confusion in the Jobs graphs until we get the new configuration completely sorted.
ID: 34424 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34456 - Posted: 22 Feb 2018, 22:07:09 UTC

We seem to have a little problem with the WMAgent again. A component has failed and a new batch I submitted is not starting up so the queue is starting to drain again. This is likely to do with work we've been doing behind the scenes to integrate CMS@Home more tightly with CMS Job Production.
ID: 34456 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 34463 - Posted: 23 Feb 2018, 7:18:43 UTC - in response to Message 34456.  

All OK again due to swift action at CERN.
ID: 34463 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 35023 - Posted: 16 Apr 2018, 11:18:54 UTC
Last modified: 16 Apr 2018, 11:19:50 UTC

I'm draining the queue for another WMAgent upgrade. Jobs should start drying up in about 10 hours or so, so consider setting No New Tasks for your CMS instances sometime in the next 10 hours, or when you see your tasks failing to pick up jobs. I'm hoping the upgrade can be done first thing tomorrow, so hopefully back on the air by about lunch-time.
ID: 35023 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 671
Credit: 5,313,849
RAC: 9,768
Message 35027 - Posted: 17 Apr 2018, 14:26:35 UTC - in response to Message 35023.  

We are back up again now, so please resume CMS tasks.
ID: 35027 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 929
Credit: 33,763,579
RAC: 2,867
Message 35030 - Posted: 18 Apr 2018, 6:32:55 UTC

Yes, Job's are running well!
Thank you Ivan for the upgrade.

Have a question because of old Job's from January and February in my Computer stats, but not from March.
Some with Errors and some are finished.
Also in Theory.
ID: 35030 · Report as offensive     Reply Quote

Message boards : CMS Application : Jobs draining for a WMAgent upgrade


©2020 CERN