Message boards : News : Interruption to CMS@Home, Wednesday 15th July
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1060
Credit: 7,737,455
RAC: 1,317
Message 43057 - Posted: 14 Jul 2020, 13:19:28 UTC

We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched.
We anticipate jobs will be available again late Wednesday (European time). I'll update this thread when it is OK to proceed.
ID: 43057 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 429
Credit: 10,589,655
RAC: 2,832
Message 43061 - Posted: 15 Jul 2020, 11:41:19 UTC - in response to Message 43057.  

We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched.
We anticipate jobs will be available again late Wednesday (European time). I'll update this thread when it is OK to proceed.


Why do we all have to set no new tasks? Surely you can just tell the server to stop handing them out?
ID: 43061 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1060
Credit: 7,737,455
RAC: 1,317
Message 43063 - Posted: 15 Jul 2020, 15:07:17 UTC - in response to Message 43061.  

We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched.
We anticipate jobs will be available again late Wednesday (European time). I'll update this thread when it is OK to proceed.


Why do we all have to set no new tasks? Surely you can just tell the server to stop handing them out?

The way the system works, if you ask for CMS jobs when none are available then the BOINC task fails after ten minutes with an error, so any work already completed by that task is discarded and you get no credit for it. As well, each time a task fails, your daily BOINC task quota is reduced. Do that enough times and you will only be allocated one task per day. Your quota will only be increased again if/when you complete a successful task.
No, I didn't write the BOINC server code...
ID: 43063 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1060
Credit: 7,737,455
RAC: 1,317
Message 43064 - Posted: 15 Jul 2020, 15:08:07 UTC

Jobs are available again. There was a hiccup, but we hit our target with time to spare.
ID: 43064 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1060
Credit: 7,737,455
RAC: 1,317
Message 43067 - Posted: 15 Jul 2020, 21:08:21 UTC - in response to Message 43063.  
Last modified: 15 Jul 2020, 21:09:50 UTC

I should have added: We do stop issuing tasks when we detect that there are no new jobs available -- a problem that we've only just solved was that jobs were available but unbeknownst to our software they were flagged as not to run on volunteer machines. This led to the sort of scenario I described earlier, as BOINC thought jobs were available and kept serving up tasks.
Notwithstanding that, remember that our tasks run for more than 12 hours (usually less than 18), running several jobs consecutively. If we run out of jobs mid-task, that can lead to a task being flagged as failed. This is why I try to give as much warning as possible of an upcoming outage, so that tasks can be left to finish up before jobs run out.
ID: 43067 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,464,929
RAC: 2,680
Message 43078 - Posted: 16 Jul 2020, 8:41:33 UTC - in response to Message 43067.  

... If we run out of jobs mid-task, that can lead to a task being flagged as failed. ...
This is very bad and should not happen. The LHC@home team should take more part of the CMS@home and not let struggle Ivan alone.
Can CMS not been setup like ATLAS and Theory running 1 single job during a BOINC-task receiving the needed input-data into the shared host-folder and not over the VM-network before the VM is started?
ID: 43078 · Report as offensive     Reply Quote

Message boards : News : Interruption to CMS@Home, Wednesday 15th July


©2024 CERN