Message boards : CMS Application : CMS@Home downtime pencilled in for Thursday
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29958 - Posted: 18 Apr 2017, 15:51:11 UTC

CERN want to update the WMAgent we use to submit CMS@Homejobs. This is tentatively scheduled for Thursday. I'll let the queue drain, and possibly kill off some of the smaller batches I've been submitting to get me finer timing control. I expect that we can go another 24 hours before I start draining the queue, but keep an eye on the running jobs graph and be prepared to set No New Tasks when you see it dipping -- or do it beforehand if you really want to protect your daily job quotas.
More news as details become firmer.
ID: 29958 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,783,550
RAC: 99,197
Message 29962 - Posted: 18 Apr 2017, 16:19:23 UTC - in response to Message 29958.  

Wouldn´t it be more efficient to also stop the WU generation and perhaps decrease the number of unsent WUs beforehand?
I´m sure a lot of users are still on holidays.
ID: 29962 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29965 - Posted: 18 Apr 2017, 17:53:53 UTC - in response to Message 29962.  

Perhaps, but I don't think I can do that myself. My main line of thought is that Laurence's "opportunistic" cluster of machines isn't controlled by BOINC so that if people stop their boxes in a considered manner then his servers will drain the queue with minimum damage to Volunteer's PCs. Plus I have deliberately submitted smaller batches lately -- I can abort un-started batches at will to tailor the queue drainage to Alan's timetable.
ID: 29965 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 373,349,105
RAC: 51,096
Message 29966 - Posted: 18 Apr 2017, 18:35:20 UTC

I agree with computezrmle, but it seems like it's difficult for you and the other CERN staff.

Maybe Laurance can create some easy mechanism for you?
ID: 29966 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29969 - Posted: 19 Apr 2017, 13:03:07 UTC

My best estimate at the moment is that the queue will start to drain Thursday morning, and should be dry within a couple of hours.
ID: 29969 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29973 - Posted: 19 Apr 2017, 21:56:06 UTC - in response to Message 29962.  

Wouldn´t it be more efficient to also stop the WU generation and perhaps decrease the number of unsent WUs beforehand?
I´m sure a lot of users are still on holidays.

Pardon me for only just thinking this through, but the main problem in shutting down the WMAgent system is to make sure that the queues are drained as far as possible before shutting down -- limiting WU creation would actually hinder this.
ID: 29973 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29979 - Posted: 20 Apr 2017, 13:51:43 UTC - in response to Message 29969.  

My best estimate at the moment is that the queue will start to drain Thursday morning, and should be dry within a couple of hours.

I've given Alan the go-ahead to do his update. I'll let you know when we have jobs again.
ID: 29979 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 373,349,105
RAC: 51,096
Message 29982 - Posted: 20 Apr 2017, 17:18:42 UTC

Thanks Ivan, I put to NNT this morning as per your advice.
ID: 29982 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29985 - Posted: 20 Apr 2017, 19:18:30 UTC - in response to Message 29982.  

A couple of little glitches (details over at -dev) but jobs are available again and I've managed to snare a few of them myself. Enjoy.
ID: 29985 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,783,550
RAC: 99,197
Message 29986 - Posted: 20 Apr 2017, 20:23:56 UTC - in response to Message 29985.  

The first WUs after the update started without errors.
Thank you.
ID: 29986 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29987 - Posted: 20 Apr 2017, 20:45:12 UTC - in response to Message 29986.  
Last modified: 20 Apr 2017, 20:45:22 UTC

The first WUs after the update started without errors.
Thank you.

Cheers! Sorry it took a bit longer than we'd hoped.
ID: 29987 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,783,550
RAC: 99,197
Message 29989 - Posted: 20 Apr 2017, 21:09:47 UTC

Task upload also works.

The current tasks have significant shorter runtimes.
Roughly 65% compared to older tasks (on both of my hosts).
ID: 29989 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,336,532
RAC: 10,047
Message 29990 - Posted: 20 Apr 2017, 22:25:55 UTC - in response to Message 29989.  
Last modified: 20 Apr 2017, 22:34:03 UTC

Task upload also works.

The current tasks have significant shorter runtimes.
Roughly 65% compared to older tasks (on both of my hosts).

That's a little strange, I'm reasonably sure I didn't change the number of events per job, just the jobs per batch. I'll take a look.

[Added] I've had two tasks fail with a heartbeat failure, the rest are still running, so no data yet. [/Added]
[And again] Job failure rate so far is 2.7%, somewhat less than recent values. [/Aa]
ID: 29990 · Report as offensive     Reply Quote

Message boards : CMS Application : CMS@Home downtime pencilled in for Thursday


©2020 CERN