Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47577 - Posted: 6 Dec 2022, 10:22:56 UTC

A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified.
More later.
ID: 47577 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47580 - Posted: 7 Dec 2022, 9:51:35 UTC - in response to Message 47577.  
Last modified: 7 Dec 2022, 9:51:56 UTC

A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified.
More later.

The logjam has just been cleared and the task server has noticed that jobs are available, and is sending out tasks again -- I've just got two on my first machine.
ID: 47580 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47581 - Posted: 7 Dec 2022, 10:24:51 UTC - in response to Message 47580.  

A new workflow submitted last night is stuck in "staging" rather than progressing to "running" so we are running out of jobs. Probably best to set No New Tasks until it's sorted. I've submitted another workflow, but I don't think it will bypass the older one in the queue. WMCore team have been notified.
More later.

The logjam has just been cleared and the task server has noticed that jobs are available, and is sending out tasks again -- I've just got two on my first machine.

The underlying cause is detailed in https://github.com/dmwm/WMCore/issues/11386 and the cure in https://github.com/dmwm/WMCore/pull/11387.
All my machines now are running a quota of tasks.
ID: 47581 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2141
Credit: 175,128,029
RAC: 107,332
Message 47626 - Posted: 30 Dec 2022, 13:44:35 UTC

Anybody there to refill the CMS queue this year?
ID: 47626 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47628 - Posted: 30 Dec 2022, 15:53:19 UTC - in response to Message 47626.  

Anybody there to refill the CMS queue this year?

Yes, it's just that we've been running more jobs than usual lately, and I have been taking it easy...
New batch in the pipeline.
ID: 47628 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47629 - Posted: 30 Dec 2022, 16:57:43 UTC - in response to Message 47628.  

Anybody there to refill the CMS queue this year?

Yes, it's just that we've been running more jobs than usual lately, and I have been taking it easy...
New batch in the pipeline.

Jobs are available again now.
ID: 47629 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2141
Credit: 175,128,029
RAC: 107,332
Message 47630 - Posted: 30 Dec 2022, 19:09:44 UTC - in response to Message 47629.  

Yes.
Thanks.
ID: 47630 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47712 - Posted: 19 Jan 2023, 10:07:50 UTC

I'll be running down the queues at the weekend in preparation for a WMAgent upgrade next week. Be prepared to set No New Tasks on Sunday or so.
ID: 47712 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47713 - Posted: 20 Jan 2023, 8:12:44 UTC - in response to Message 47712.  

I'll be running down the queues at the weekend in preparation for a WMAgent upgrade next week. Be prepared to set No New Tasks on Sunday or so.

Oops, I miscalculated (based on 10,000 jobs rather than 20,000) -- we'll run out of jobs Monday into Tuesday.
ID: 47713 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47718 - Posted: 22 Jan 2023, 12:50:57 UTC - in response to Message 47713.  

Updated calculation: the queues should be almost drained in 24 hours from now.
ID: 47718 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47726 - Posted: 24 Jan 2023, 12:53:13 UTC

A new workflow is in the pipeline. Jobs should be available in about one hour.
ID: 47726 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2141
Credit: 175,128,029
RAC: 107,332
Message 47763 - Posted: 16 Feb 2023, 13:47:55 UTC

There were some glitches regarding the availability of CMS subtasks yesterday evening as well as this afternoon.
Any idea why?
ID: 47763 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2141
Credit: 175,128,029
RAC: 107,332
Message 47776 - Posted: 21 Feb 2023, 17:03:05 UTC

Looks like there were another 2 glitches today:
10:00 UTC
16:12 UTC
ID: 47776 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47777 - Posted: 22 Feb 2023, 8:17:13 UTC - in response to Message 47763.  
Last modified: 22 Feb 2023, 8:19:48 UTC

There were some glitches regarding the availability of CMS subtasks yesterday evening as well as this afternoon.
Any idea why?

We're trying to understand a downstream problem, which means we have to make a change, submit a new workflow, and cancel the old. I'm trying to do this in a way to minimise the time when no jobs are available, but sometimes it's unavoidably longer. I think we understand the problem now, but patching it is problematic (the WMAgent code is quite opaque, and the experts aren't available to help).
ID: 47777 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47778 - Posted: 23 Feb 2023, 14:08:31 UTC

Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating.
ID: 47778 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47781 - Posted: 23 Feb 2023, 16:15:22 UTC - in response to Message 47778.  

Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating.

I can't see any connection between these "failures" and the patch we are testing -- the patch should only be applicable to the post-production processing, which is done on dedicated VMs at CERN, not on Volunteer machines. Production jobs are actually running successfully, but the final report says that logs are not being found. Puzzling...
ID: 47781 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47782 - Posted: 24 Feb 2023, 10:47:10 UTC - in response to Message 47781.  

Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating.

I can't see any connection between these "failures" and the patch we are testing -- the patch should only be applicable to the post-production processing, which is done on dedicated VMs at CERN, not on Volunteer machines. Production jobs are actually running successfully, but the final report says that logs are not being found. Puzzling...

Ah, Federica made a typo in our patch that is causing the current "failure".
ID: 47782 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 929
Credit: 6,098,080
RAC: 817
Message 47785 - Posted: 24 Feb 2023, 14:43:07 UTC - in response to Message 47782.  

Hmm, current workflow appears to have a problem -- almost 100% failures. Investigating.

I can't see any connection between these "failures" and the patch we are testing -- the patch should only be applicable to the post-production processing, which is done on dedicated VMs at CERN, not on Volunteer machines. Production jobs are actually running successfully, but the final report says that logs are not being found. Puzzling...

Ah, Federica made a typo in our patch that is causing the current "failure".

Fixed that, and another problem surfaced -- you cannot have a hyphen in a variable name in python -- it is interpreted as a subtraction operator!
ID: 47785 · Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2023 CERN