Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16
Author | Message |
---|---|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified. More later. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified. The logjam has just been cleared and the task server has noticed that jobs are available, and is sending out tasks again -- I've just got two on my first machine. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified. The underlying cause is detailed in https://github.com/dmwm/WMCore/issues/11386 and the cure in https://github.com/dmwm/WMCore/pull/11387. All my machines now are running a quota of tasks. ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2141 Credit: 175,124,875 RAC: 107,306 ![]() ![]() ![]() |
Anybody there to refill the CMS queue this year? |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2141 Credit: 175,124,875 RAC: 107,306 ![]() ![]() ![]() |
Yes. Thanks. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2141 Credit: 175,124,875 RAC: 107,306 ![]() ![]() ![]() |
There were some glitches regarding the availability of CMS subtasks yesterday evening as well as this afternoon. Any idea why? |
![]() Send message Joined: 15 Jun 08 Posts: 2141 Credit: 175,124,875 RAC: 107,306 ![]() ![]() ![]() |
Looks like there were another 2 glitches today: 10:00 UTC 16:12 UTC |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
There were some glitches regarding the availability of CMS subtasks yesterday evening as well as this afternoon. We're trying to understand a downstream problem, which means we have to make a change, submit a new workflow, and cancel the old. I'm trying to do this in a way to minimise the time when no jobs are available, but sometimes it's unavoidably longer. I think we understand the problem now, but patching it is problematic (the WMAgent code is quite opaque, and the experts aren't available to help). ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating. I can't see any connection between these "failures" and the patch we are testing -- the patch should only be applicable to the post-production processing, which is done on dedicated VMs at CERN, not on Volunteer machines. Production jobs are actually running successfully, but the final report says that logs are not being found. Puzzling... ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating. Ah, Federica made a typo in our patch that is causing the current "failure". ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,098,080 RAC: 817 ![]() |
Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating. Fixed that, and another problem surfaced -- you cannot have a hyphen in a variable name in python -- it is interpreted as a subtraction operator! ![]() |
©2023 CERN