Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 16 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,658,953
RAC: 143,665
Message 39425 - Posted: 23 Jul 2019, 13:21:07 UTC

Situation after the hypervisor reboot.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5087&postid=39424

Tasks are still failing with EXIT_NO_SUB_TASKS.
Looks like not only my hosts are affected.
ID: 39425 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 39426 - Posted: 23 Jul 2019, 22:09:16 UTC - in response to Message 39425.  

Looks like not only my hosts are affected.
Indeed, same here.
ID: 39426 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39427 - Posted: 24 Jul 2019, 12:23:35 UTC - in response to Message 39425.  

Situation after the hypervisor reboot.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5087&postid=39424

Tasks are still failing with EXIT_NO_SUB_TASKS.
Looks like not only my hosts are affected.

Yes, we are not picking up jobs from condor, even though plenty are pending. On the one hand:
2-3846-14590.2-3846-14590: Run analysis summary of 1 jobs.
    1 (100.00 %) match both slot and job requirements.
    1 match the requirements of this slot.
    1 have job requirements that match this slot.
but on the other:
179751.002:  Run analysis summary ignoring user priority.  Of 1 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      0 are available to run your job
I've put out a request for help.
ID: 39427 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 642,751,197
RAC: 284,269
Message 39460 - Posted: 29 Jul 2019, 19:55:44 UTC

Any news, still plenty of fails for me
ID: 39460 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39479 - Posted: 1 Aug 2019, 16:13:05 UTC - in response to Message 39460.  

OK, Federica managed to find the server which needed to be rebooted and jobs are starting to flow again. Thanks for your patience.
ID: 39479 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 39480 - Posted: 1 Aug 2019, 21:06:42 UTC - in response to Message 39479.  

OK, Federica managed to find the server which needed to be rebooted and jobs are starting to flow again. Thanks for your patience.
Looking good so far!
ID: 39480 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,658,953
RAC: 143,665
Message 39481 - Posted: 2 Aug 2019, 7:15:13 UTC - in response to Message 39479.  

Yes, works fine since yesterday evening.
Thanks.
ID: 39481 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39488 - Posted: 3 Aug 2019, 9:50:36 UTC
Last modified: 3 Aug 2019, 9:51:40 UTC

Just when you think it's sailing along OK... System Infrastructure want to make some changes to our submission scheme, in preparation for when they take over job submission. So, we need to drain the queue. This will take, I think, several days at the current rate. Keep an eye on your task exit status and be ready to set No New Tasks when you see them failing. I will, of course, post a warning if I can but I only have Internet access at work at the moment.
ID: 39488 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,658,953
RAC: 143,665
Message 39489 - Posted: 3 Aug 2019, 10:18:29 UTC - in response to Message 39488.  

System Infrastructure want to make some changes ...

Will this affect settings on volunteer's side, e.g. server names, firewall ports, etc.?
ID: 39489 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39495 - Posted: 5 Aug 2019, 7:54:43 UTC - in response to Message 39489.  

System Infrastructure want to make some changes ...

Will this affect settings on volunteer's side, e.g. server names, firewall ports, etc.?

No, just how jobs are apportioned to T3_CH_Volunteer, as far as I know. The condor server and collector will remain the same.
ID: 39495 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39499 - Posted: 6 Aug 2019, 8:46:47 UTC - in response to Message 39488.  

Just when you think it's sailing along OK... System Infrastructure want to make some changes to our submission scheme, in preparation for when they take over job submission. So, we need to drain the queue. This will take, I think, several days at the current rate. Keep an eye on your task exit status and be ready to set No New Tasks when you see them failing. I will, of course, post a warning if I can but I only have Internet access at work at the moment.

I cranked up two of my servers to use the full 40 cores... There are 1700 jobs still to finish and we're retiring about 60/hr so I make that 30 hrs to go; WMStats estimates 22 hours.
ID: 39499 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,658,953
RAC: 143,665
Message 39505 - Posted: 7 Aug 2019, 7:29:57 UTC

Looks like the subtask queue is empty.
Time to set CMS to NNT and wait for Ivan's go to reactivate it.
ID: 39505 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39506 - Posted: 7 Aug 2019, 8:16:46 UTC - in response to Message 39505.  

Looks like the subtask queue is empty.
Time to set CMS to NNT and wait for Ivan's go to reactivate it.

Yes, please do set NNT. I'll alert CERN that the queue is almost empty (down to 99 running); they may pull the plug if some jobs persist.
ID: 39506 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39509 - Posted: 7 Aug 2019, 11:02:29 UTC

There's been a slight change in plans.
"Given that we do not need to redeploy the agent, but only kill jobs in condor and let them get recreated with the JobSubmitter/schedd changes, I think you can go ahead and submit another workflow to [keep] volunteers happy."
So, I'll continue to submit smaller batches and you can resume new tasks.
ID: 39509 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39575 - Posted: 11 Aug 2019, 10:27:02 UTC

It'd help now if you turned on No New Tasks for the next day or so, so that we can run down all the queues ready for an intervention tomorrow night. I'll check again tonight if I can, and top up the jobs if necessary.
ID: 39575 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39582 - Posted: 12 Aug 2019, 15:58:50 UTC

OK, the change has been done and I'm injecting a new (small) workflow. It'll take a little while to see if things have started up again successfully.
ID: 39582 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39583 - Posted: 12 Aug 2019, 16:53:12 UTC

Things look OK, jobs have started again. I'm waiting on word from the US as to whether it was successful (it seems so, but I'm not the expert).
ID: 39583 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 990
Credit: 6,264,307
RAC: 191
Message 39587 - Posted: 12 Aug 2019, 17:42:47 UTC - in response to Message 39583.  

Things look OK, jobs have started again. I'm waiting on word from the US as to whether it was successful (it seems so, but I'm not the expert).

They say that things are looking good. I've submitted more jobs and am off to my internet-deficient temporary digs. Hope things still look rosy tomorrow...
ID: 39587 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 39622 - Posted: 16 Aug 2019, 10:22:13 UTC

I have a CMS task that has presumably been at idle for 12 hours.
I was going to abort it, but it finished while writing this post.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=240148745
ID: 39622 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1681
Credit: 99,340,821
RAC: 109,889
Message 39646 - Posted: 19 Aug 2019, 6:10:48 UTC - in response to Message 39622.  

since a few hours ago, all CMS tasks fail with:

207 (0x000000CF) EXIT_NO_SUB_TASKS
ID: 39646 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 16 · Next

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2024 CERN