Message boards :
News :
Warning: possible shortage of CMS jobs - set No New Tasks as a precaution
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 29 Aug 05 Posts: 939 Credit: 6,156,338 RAC: 1,076 ![]() |
There was an intervention (i.e. upgrade) yesterday afternoon[1] on the cmsweb-testbed system we use to submit CMS workflows that left things a bit confused. One problem was fixed, and the monitor shows all good. However, we are running out of CMS jobs -- maybe 10 hours left -- but the new batch I submitted yesterday isn't showing up on the testbed monitor. I submitted another last night but still neither are being shown this morning, so I submitted yet another batch. At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again. [1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!? ![]() |
Send message Joined: 22 Mar 17 Posts: 51 Credit: 9,671,568 RAC: 2,907 ![]() ![]() |
[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!? How many times do we have to ask that tasks not be sent to users w/o jobs... |
![]() Send message Joined: 29 Aug 05 Posts: 939 Credit: 6,156,338 RAC: 1,076 ![]() |
[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!? GP, WM In fairness, though, there is a mechanism in place to stop sending tasks when the job queue is empty. There is, unfortunately a hysteresis (probably several hystereses) in the system due to the fact that delays occur in propagating the state of the queues, etc., across the myriad processes that control the project. Compounded by the fact that tasks may still be running when the job queue becomes empty; I don't think we adequately address that still. ![]() |
Send message Joined: 14 Jan 10 Posts: 1168 Credit: 7,216,942 RAC: 2,461 ![]() ![]() ![]() |
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again. There are a lot of aborted jobs and several failed jobs too. Are they resubmitted normally? |
![]() Send message Joined: 29 Aug 05 Posts: 939 Credit: 6,156,338 RAC: 1,076 ![]() |
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again. Yes, I believe so. As far as I recall, each job gets three picks at the cherry. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 939 Credit: 6,156,338 RAC: 1,076 ![]() |
|
![]() ![]() Send message Joined: 2 Jun 07 Posts: 31 Credit: 1,552,911 RAC: 221 ![]() ![]() |
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again. Ivan Why would not having any new work cause a Job to crash ? Two different parts of work order processing. Thanks Bill F In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 939 Credit: 6,156,338 RAC: 1,076 ![]() |
[Ivan If there are no jobs on the condor server, a request for new work times out after, IIRC, ten minutes. BOINC can mistakenly identify that as an error and mark the task as a failure. We have processes in place not to serve new tasks if the queue is empty, but as yet no reliable way to gracefully end a task if the queue runs out of jobs while the task is running. (That's why I often have sleepless nights checking on the state of the queue...) ![]() |
©2023 CERN