Message boards : News : CMS jobs unavailable Weds 27th September
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32525 - Posted: 26 Sep 2017, 12:23:21 UTC

An upgrade to the CMS@Home workflow management system (WMAgent) is planned for tomorrow (Wed Sep 27th). This needs the current batch of jobs to be stopped so that the queue is empty. I plan to do this about 0700-0800 UTC on Wednesday.
To avoid "error while computing" task failures and the resulting back-off of your daily quotas, we suggest you set all your CMS machines to No New Tasks at least 12 hours beforehand to allow current tasks to time out in the normal way. You can stop BOINC once all your tasks are finished, if you wish.
Exactly how long the intervention will take is unclear, and there will be a delay of up to an hour to get a new batch of jobs queued afterwards. I will post here when jobs are available again, hopefully before the end of the day European time.
ID: 32525 · Report as offensive     Reply Quote
Paul

Send message
Joined: 7 Feb 17
Posts: 5
Credit: 82,139
RAC: 1
Message 32526 - Posted: 26 Sep 2017, 16:18:26 UTC - in response to Message 32525.  

Habe die Nachricht erhalten, werde Boinc bei Gelegenheit stoppen.
Bin erst gegen 1700 zuhause. Vielen Dank
ID: 32526 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32528 - Posted: 26 Sep 2017, 17:36:16 UTC - in response to Message 32526.  
Last modified: 26 Sep 2017, 17:36:26 UTC

Habe die Nachricht erhalten, werde Boinc bei Gelegenheit stoppen.
Bin erst gegen 1700 zuhause. Vielen Dank

It's best to let your tasks run out before stopping BOINC, which is why we suggest setting No New Tasks. Otherwise you run the risk of the current job failing when it reports, since its batch will no longer be active.
ID: 32528 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32531 - Posted: 27 Sep 2017, 7:41:05 UTC

I've stopped the current batch; now waiting for the queue to drain.
ID: 32531 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 401
Credit: 4,008,743
RAC: 7,889
Message 32532 - Posted: 27 Sep 2017, 7:55:44 UTC

thanks, Ivan, for the Information.

So let's hope that all goes well :-)
ID: 32532 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32535 - Posted: 27 Sep 2017, 13:22:26 UTC - in response to Message 32532.  

One particular component is taking time to drain. We may not have more CMS jobs until some time on Thursday. Naturally I'll let you know as soon as we are running again but for now feel free to let your computers relax, or work on some other project overnight.
ID: 32535 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 401
Credit: 4,008,743
RAC: 7,889
Message 32537 - Posted: 27 Sep 2017, 15:33:14 UTC - in response to Message 32535.  

Ivan, thanks for the Information.

That's interesting, because a look on this sashboard

https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php

may suggest that CMS tasks have started getting crunched again.
ID: 32537 · Report as offensive     Reply Quote
Paul

Send message
Joined: 7 Feb 17
Posts: 5
Credit: 82,139
RAC: 1
Message 32538 - Posted: 27 Sep 2017, 15:36:52 UTC - in response to Message 32535.  

Vielen Dank dafür daß Sie an meinen kleinen ABAKUS denken!Bis dann ciao!
ID: 32538 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32539 - Posted: 27 Sep 2017, 16:04:34 UTC - in response to Message 32537.  
Last modified: 27 Sep 2017, 16:10:28 UTC

Ivan, thanks for the Information.

That's interesting, because a look on this sashboard

https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php

may suggest that CMS tasks have started getting crunched again.

They are "administrative" tasks I think, probably LogMerge since they seem mainly to be failing. :-( [LogMerge jobs try to write to a machine I don't (yet) have credentials for.] A couple of the WMAgent components were down since the weekend but Alan wasn't around to re-start them; I didn't try any further afield as they didn't seem to be affecting volunteer jobs. I now know another scientist who can restart failed components, and also where to report it if she's also not available.
[Edit] Yes, they are mainly LogCollect jobs. I should have advised Alan to just kill them and do the update; I guess it's too late in the day now so we shall see whether they have all timed out overnight. If they are like normal HTCondor jobs they will retry twice before quitting. [/Edit]
ID: 32539 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32544 - Posted: 27 Sep 2017, 21:16:24 UTC - in response to Message 32539.  

OK, the agent has been updated, and I've submitted a small test batch. However, the agent is currently showing a message about its proxy certificate having expired, so I'm not sure if jobs will be queued yet. I've notified Alan, but it's 2315 at CERN so I don't expect him to respond tonight. I'll update if and when things change.
ID: 32544 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 32545 - Posted: 27 Sep 2017, 21:37:18 UTC - in response to Message 32544.  

OK, the agent has been updated, and I've submitted a small test batch. However, the agent is currently showing a message about its proxy certificate having expired, so I'm not sure if jobs will be queued yet. I've notified Alan, but it's 2315 at CERN so I don't expect him to respond tonight. I'll update if and when things change.

Right, Alan responded already, there are jobs in the queue and 800 are running. In the words of the Antz Pantz ad, "Sic 'em, Rex!"
ID: 32545 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 401
Credit: 4,008,743
RAC: 7,889
Message 32552 - Posted: 28 Sep 2017, 7:15:08 UTC - in response to Message 32545.  

I have re-startet CMS tasks on all 3 PCs on which I use to run CMS.
Everything seems to work fine.
Thanks to everybody :-)
ID: 32552 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 33146 - Posted: 27 Nov 2017, 21:10:19 UTC

The CMS job queue is empty, apparently due to some difficulties with an upgrade at CERN. Best to set your CMS machines to "No New Tasks" for now.
ID: 33146 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 332
Credit: 2,503,562
RAC: 4,536
Message 33150 - Posted: 28 Nov 2017, 9:16:54 UTC - in response to Message 33146.  

The CMS job queue is empty, apparently due to some difficulties with an upgrade at CERN. Best to set your CMS machines to "No New Tasks" for now.

There are jobs in the queue again (since 0200 GMT) but there are also still problems "behind the scenes" at CERN. There have been people working extremely long hours on this -- I suspect they may be taking a break for some sleep right now.
ID: 33150 · Report as offensive     Reply Quote

Message boards : News : CMS jobs unavailable Weds 27th September


©2017 CERN