Message boards : News : Warning: possible shortage of CMS jobs - set No New Tasks as a precaution
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 525
Credit: 4,169,812
RAC: 3,891
Message 38388 - Posted: 23 Mar 2019, 11:31:44 UTC

There was an intervention (i.e. upgrade) yesterday afternoon[1] on the cmsweb-testbed system we use to submit CMS workflows that left things a bit confused. One problem was fixed, and the monitor shows all good. However, we are running out of CMS jobs -- maybe 10 hours left -- but the new batch I submitted yesterday isn't showing up on the testbed monitor. I submitted another last night but still neither are being shown this morning, so I submitted yet another batch.
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.

[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!?
ID: 38388 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 41
Credit: 2,957,149
RAC: 44,043
Message 38389 - Posted: 23 Mar 2019, 11:50:02 UTC - in response to Message 38388.  

[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!?


How many times do we have to ask that tasks not be sent to users w/o jobs...
ID: 38389 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 525
Credit: 4,169,812
RAC: 3,891
Message 38391 - Posted: 23 Mar 2019, 13:15:42 UTC - in response to Message 38389.  
Last modified: 23 Mar 2019, 13:24:00 UTC

[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!?


How many times do we have to ask that tasks not be sent to users w/o jobs...

GP, WM

In fairness, though, there is a mechanism in place to stop sending tasks when the job queue is empty. There is, unfortunately a hysteresis (probably several hystereses) in the system due to the fact that delays occur in propagating the state of the queues, etc., across the myriad processes that control the project.
Compounded by the fact that tasks may still be running when the job queue becomes empty; I don't think we adequately address that still.
ID: 38391 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 634
Credit: 4,167,381
RAC: 7,505
Message 38393 - Posted: 23 Mar 2019, 16:58:46 UTC - in response to Message 38388.  

At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.

There are a lot of aborted jobs and several failed jobs too.

Are they resubmitted normally?
ID: 38393 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 525
Credit: 4,169,812
RAC: 3,891
Message 38397 - Posted: 23 Mar 2019, 17:51:44 UTC - in response to Message 38393.  

At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.

There are a lot of aborted jobs and several failed jobs too.

Are they resubmitted normally?

Yes, I believe so. As far as I recall, each job gets three picks at the cherry.
ID: 38397 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 525
Credit: 4,169,812
RAC: 3,891
Message 38398 - Posted: 23 Mar 2019, 17:52:43 UTC

OK, the batch I submitted last night is now showing on the monitor, so you can resume tasks at will.
ID: 38398 · Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 2 Jun 07
Posts: 28
Credit: 1,127,888
RAC: 14
Message 38409 - Posted: 24 Mar 2019, 19:08:28 UTC - in response to Message 38397.  

At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.

There are a lot of aborted jobs and several failed jobs too.

Are they resubmitted normally?

Yes, I believe so. As far as I recall, each job gets three picks at the cherry.



Ivan

Why would not having any new work cause a Job to crash ? Two different parts of work order processing.

Thanks
Bill F
ID: 38409 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 525
Credit: 4,169,812
RAC: 3,891
Message 38430 - Posted: 25 Mar 2019, 16:11:03 UTC - in response to Message 38409.  

[Ivan

Why would not having any new work cause a Job to crash ? Two different parts of work order processing.

Thanks
Bill F

If there are no jobs on the condor server, a request for new work times out after, IIRC, ten minutes. BOINC can mistakenly identify that as an error and mark the task as a failure. We have processes in place not to serve new tasks if the queue is empty, but as yet no reliable way to gracefully end a task if the queue runs out of jobs while the task is running. (That's why I often have sleepless nights checking on the state of the queue...)
ID: 38430 · Report as offensive     Reply Quote

Message boards : News : Warning: possible shortage of CMS jobs - set No New Tasks as a precaution


©2019 CERN