Message boards :
News :
Interruption to CMS@Home, Wednesday 15th July
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched. We anticipate jobs will be available again late Wednesday (European time). I'll update this thread when it is OK to proceed. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,589,655 RAC: 2,832 |
We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched. Why do we all have to set no new tasks? Surely you can just tell the server to stop handing them out? |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
We need to interrupt the CMS project tomorrow to deploy a new Workflow Management Agent. This means that jobs will not be available from sometime tonight. We recommend that you set your CMS machines to No New Tasks as soon as possible, to avoid tasks terminating with an error if a job can't be fetched. The way the system works, if you ask for CMS jobs when none are available then the BOINC task fails after ten minutes with an error, so any work already completed by that task is discarded and you get no credit for it. As well, each time a task fails, your daily BOINC task quota is reduced. Do that enough times and you will only be allocated one task per day. Your quota will only be increased again if/when you complete a successful task. No, I didn't write the BOINC server code... |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
|
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
I should have added: We do stop issuing tasks when we detect that there are no new jobs available -- a problem that we've only just solved was that jobs were available but unbeknownst to our software they were flagged as not to run on volunteer machines. This led to the sort of scenario I described earlier, as BOINC thought jobs were available and kept serving up tasks. Notwithstanding that, remember that our tasks run for more than 12 hours (usually less than 18), running several jobs consecutively. If we run out of jobs mid-task, that can lead to a task being flagged as failed. This is why I try to give as much warning as possible of an upcoming outage, so that tasks can be left to finish up before jobs run out. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,464,929 RAC: 2,680 |
... If we run out of jobs mid-task, that can lead to a task being flagged as failed. ...This is very bad and should not happen. The LHC@home team should take more part of the CMS@home and not let struggle Ivan alone. Can CMS not been setup like ATLAS and Theory running 1 single job during a BOINC-task receiving the needed input-data into the shared host-folder and not over the VM-network before the VM is started? |
©2024 CERN