Message boards :
News :
CMS jobs unavailable Weds 27th September
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
An upgrade to the CMS@Home workflow management system (WMAgent) is planned for tomorrow (Wed Sep 27th). This needs the current batch of jobs to be stopped so that the queue is empty. I plan to do this about 0700-0800 UTC on Wednesday. To avoid "error while computing" task failures and the resulting back-off of your daily quotas, we suggest you set all your CMS machines to No New Tasks at least 12 hours beforehand to allow current tasks to time out in the normal way. You can stop BOINC once all your tasks are finished, if you wish. Exactly how long the intervention will take is unclear, and there will be a delay of up to an hour to get a new batch of jobs queued afterwards. I will post here when jobs are available again, hopefully before the end of the day European time. |
Send message Joined: 7 Feb 17 Posts: 5 Credit: 82,139 RAC: 0 |
Habe die Nachricht erhalten, werde Boinc bei Gelegenheit stoppen. Bin erst gegen 1700 zuhause. Vielen Dank |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
Habe die Nachricht erhalten, werde Boinc bei Gelegenheit stoppen. It's best to let your tasks run out before stopping BOINC, which is why we suggest setting No New Tasks. Otherwise you run the risk of the current job failing when it reports, since its batch will no longer be active. |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
|
Send message Joined: 18 Dec 15 Posts: 1735 Credit: 113,242,627 RAC: 73,712 |
thanks, Ivan, for the Information. So let's hope that all goes well :-) |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
|
Send message Joined: 18 Dec 15 Posts: 1735 Credit: 113,242,627 RAC: 73,712 |
Ivan, thanks for the Information. That's interesting, because a look on this sashboard https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php may suggest that CMS tasks have started getting crunched again. |
Send message Joined: 7 Feb 17 Posts: 5 Credit: 82,139 RAC: 0 |
Vielen Dank dafür daß Sie an meinen kleinen ABAKUS denken!Bis dann ciao! |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
Ivan, thanks for the Information. They are "administrative" tasks I think, probably LogMerge since they seem mainly to be failing. :-( [LogMerge jobs try to write to a machine I don't (yet) have credentials for.] A couple of the WMAgent components were down since the weekend but Alan wasn't around to re-start them; I didn't try any further afield as they didn't seem to be affecting volunteer jobs. I now know another scientist who can restart failed components, and also where to report it if she's also not available. [Edit] Yes, they are mainly LogCollect jobs. I should have advised Alan to just kill them and do the update; I guess it's too late in the day now so we shall see whether they have all timed out overnight. If they are like normal HTCondor jobs they will retry twice before quitting. [/Edit] |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
OK, the agent has been updated, and I've submitted a small test batch. However, the agent is currently showing a message about its proxy certificate having expired, so I'm not sure if jobs will be queued yet. I've notified Alan, but it's 2315 at CERN so I don't expect him to respond tonight. I'll update if and when things change. |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
OK, the agent has been updated, and I've submitted a small test batch. However, the agent is currently showing a message about its proxy certificate having expired, so I'm not sure if jobs will be queued yet. I've notified Alan, but it's 2315 at CERN so I don't expect him to respond tonight. I'll update if and when things change. Right, Alan responded already, there are jobs in the queue and 800 are running. In the words of the Antz Pantz ad, "Sic 'em, Rex!" |
Send message Joined: 18 Dec 15 Posts: 1735 Credit: 113,242,627 RAC: 73,712 |
I have re-startet CMS tasks on all 3 PCs on which I use to run CMS. Everything seems to work fine. Thanks to everybody :-) |
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
|
Send message Joined: 29 Aug 05 Posts: 1045 Credit: 7,316,974 RAC: 9,752 |
The CMS job queue is empty, apparently due to some difficulties with an upgrade at CERN. Best to set your CMS machines to "No New Tasks" for now. There are jobs in the queue again (since 0200 GMT) but there are also still problems "behind the scenes" at CERN. There have been people working extremely long hours on this -- I suspect they may be taking a break for some sleep right now. |
©2024 CERN