Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 16 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 43916 - Posted: 15 Dec 2020, 9:09:35 UTC - in response to Message 43907.  

We want to make another update to the WMAgent codes on Monday
Ivan, is there a problem with the update? There are still no tasks available.

Yes, it was delayed. We started up again about 2130 UTC.
ID: 43916 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 43917 - Posted: 15 Dec 2020, 9:13:08 UTC - in response to Message 43893.  

We want to make another update to the WMAgent codes on Monday, ...
a question just out of curiosity: why do they fiddle around with this WMAgent that frequently?

Software evolves. At the moment they are updating everything to run in Kubernetes containers so there is extra development going on. You don't see every release of WMCore, there are sometimes 2 or 3 per week. Alan only updates our Agent with a stable release when it's needed to keep in step with other systems.
ID: 43917 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 43918 - Posted: 15 Dec 2020, 9:33:46 UTC - in response to Message 43916.  

... We started up again about 2130 UTC.

Subtasks may be there but ATM there are no new tasks available at the project server.
https://lhcathome.cern.ch/lhcathome/server_status.php
ID: 43918 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 43923 - Posted: 15 Dec 2020, 16:21:39 UTC - in response to Message 43918.  

... We started up again about 2130 UTC.

Subtasks may be there but ATM there are no new tasks available at the project server.
https://lhcathome.cern.ch/lhcathome/server_status.php

Unfortunately the job queue limits were not adjusted to our normal values last night, and WMAgent/Condor was limiting us to 100 jobs (as you might have noticed in the job graphs). I guess this screwed up Laurence's scripts for creating BOINC tasks depending on the number of queued jobs. I alerted Alan to this and we are back at our normal numbers now.
ID: 43923 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 43924 - Posted: 15 Dec 2020, 16:31:09 UTC - in response to Message 43923.  

... we are back at our normal numbers now.

Yes, we are.
Thanks.
ID: 43924 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 44133 - Posted: 18 Jan 2021, 8:59:07 UTC

Since 8:00 UTC this morning fresh CMS tasks fail with "EXIT_NO_SUB_TASKS".
ID: 44133 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 44139 - Posted: 18 Jan 2021, 12:02:27 UTC - in response to Message 44133.  
Last modified: 18 Jan 2021, 12:08:04 UTC

Looks like a problem somewhere for a while. It seems to be recovering now but we're low on jobs; I'll submit some more.
ID: 44139 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 44276 - Posted: 6 Feb 2021, 23:10:50 UTC

Advance Warning: We want to update our WMCore/WMAgent software next week. If I've done my sums right, we'll start running out of jobs sometime Monday night or early Tuesday. Please set your CMS project to No New Tasks on Monday night, or whenever you start to see the "Running Jobs" graph start to dip.
Thanks, ivan
ID: 44276 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,967,734
RAC: 79,594
Message 44277 - Posted: 7 Feb 2021, 5:56:05 UTC - in response to Message 44276.  

Ivan, thanks for the advance information :-)
ID: 44277 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 131
Credit: 58,014,547
RAC: 8,410
Message 44295 - Posted: 11 Feb 2021, 12:04:54 UTC - in response to Message 44277.  

I see number of jobs to send on LHC status page. Is it time to turn NNT off?
ID: 44295 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 44297 - Posted: 11 Feb 2021, 12:14:43 UTC - in response to Message 44295.  

They obviously did the planned update this morning.
ATM the number of running subtasks is increasing.
Hence, it may be save to continue CMS.
ID: 44297 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 44299 - Posted: 11 Feb 2021, 14:00:22 UTC

Looks like the new CMS subtasks process 5000 records instead of 10000.
By intention?
ID: 44299 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1461
Credit: 9,860,088
RAC: 2,430
Message 44301 - Posted: 11 Feb 2021, 17:54:33 UTC - in response to Message 44299.  

Looks like the new CMS subtasks process 5000 records instead of 10000.
By intention?
Can't recognize that: 'FirstEvent' : 5910001, 'LastEvent' : 5920000
ID: 44301 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2686
Credit: 286,936,279
RAC: 56,063
Message 44303 - Posted: 12 Feb 2021, 12:39:39 UTC - in response to Message 44301.  

Only the first 25 subtasks I got after the restart were short ones with 5000 records.
Since then they run the usual 10000 records.
ID: 44303 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,967,734
RAC: 79,594
Message 44374 - Posted: 23 Feb 2021, 7:24:23 UTC - in response to Message 44303.  

having another problem since last night, see here: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10679599

207 (0x000000CF) EXIT_NO_SUB_TASKS
...
2021-02-23 08:10:07 (11516): Guest Log: [ERROR] No jobs were available to run.
ID: 44374 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,967,734
RAC: 79,594
Message 44375 - Posted: 23 Feb 2021, 8:27:01 UTC - in response to Message 44374.  
Last modified: 23 Feb 2021, 9:02:26 UTC

at some point of time, there was an automatic stop of new CMS tasks to be downloaded once the system runs out of jobs.
Is this no longer working?
ID: 44375 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,967,734
RAC: 79,594
Message 44376 - Posted: 23 Feb 2021, 12:41:25 UTC

Ivan, can you estimate when jobs will be available again ?
ID: 44376 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 44378 - Posted: 23 Feb 2021, 15:36:13 UTC - in response to Message 44299.  

Looks like the new CMS subtasks process 5000 records instead of 10000.
By intention?

Sorry, didn't see this earlier. My colleague posted a small workflow after the intervention with different parameters to the ones I usually use.
ID: 44378 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 44380 - Posted: 23 Feb 2021, 15:39:40 UTC - in response to Message 44376.  

Ivan, can you estimate when jobs will be available again ?

Haven't had a response from anyone at CERN yet, so I don't know if this is a repeat of Sunday night's problem.
ID: 44380 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,459,089
RAC: 9,274
Message 44381 - Posted: 23 Feb 2021, 15:41:52 UTC - in response to Message 44375.  

at some point of time, there was an automatic stop of new CMS tasks to be downloaded once the system runs out of jobs.
Is this no longer working?

A different problem the last few days. We haven't run out of jobs; there are plenty in the queue but the condor server isn't sending them out, for reasons I haven't found out yet.
ID: 44381 · Report as offensive     Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 16 · Next

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2025 CERN