Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,453,808 RAC: 137,141 ![]() ![]() |
Situation after the hypervisor reboot. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5087&postid=39424 Tasks are still failing with EXIT_NO_SUB_TASKS. Looks like not only my hosts are affected. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
Looks like not only my hosts are affected.Indeed, same here. |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
Situation after the hypervisor reboot. Yes, we are not picking up jobs from condor, even though plenty are pending. On the one hand: 2-3846-14590.2-3846-14590: Run analysis summary of 1 jobs.    1 (100.00 %) match both slot and job requirements.    1 match the requirements of this slot.    1 have job requirements that match this slot.but on the other: 179751.002: Run analysis summary ignoring user priority. Of 1 machines,      0 are rejected by your job's requirements      0 reject your job because of their own requirements      0 match and are already running your jobs      0 match but are serving other users      0 are available to run your jobI've put out a request for help. ![]() |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 703,615,718 RAC: 155,199 ![]() ![]() ![]() |
Any news, still plenty of fails for me |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
OK, Federica managed to find the server which needed to be rebooted and jobs are starting to flow again. Thanks for your patience.Looking good so far! |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,453,808 RAC: 137,141 ![]() ![]() |
Yes, works fine since yesterday evening. Thanks. |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
Just when you think it's sailing along OK... System Infrastructure want to make some changes to our submission scheme, in preparation for when they take over job submission. So, we need to drain the queue. This will take, I think, several days at the current rate. Keep an eye on your task exit status and be ready to set No New Tasks when you see them failing. I will, of course, post a warning if I can but I only have Internet access at work at the moment. ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,453,808 RAC: 137,141 ![]() ![]() |
System Infrastructure want to make some changes ... Will this affect settings on volunteer's side, e.g. server names, firewall ports, etc.? |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
Just when you think it's sailing along OK... System Infrastructure want to make some changes to our submission scheme, in preparation for when they take over job submission. So, we need to drain the queue. This will take, I think, several days at the current rate. Keep an eye on your task exit status and be ready to set No New Tasks when you see them failing. I will, of course, post a warning if I can but I only have Internet access at work at the moment. I cranked up two of my servers to use the full 40 cores... There are 1700 jobs still to finish and we're retiring about 60/hr so I make that 30 hrs to go; WMStats estimates 22 hours. ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,453,808 RAC: 137,141 ![]() ![]() |
Looks like the subtask queue is empty. Time to set CMS to NNT and wait for Ivan's go to reactivate it. |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
There's been a slight change in plans. "Given that we do not need to redeploy the agent, but only kill jobs in condor and let them get recreated with the JobSubmitter/schedd changes, I think you can go ahead and submit another workflow to [keep] volunteers happy." So, I'll continue to submit smaller batches and you can resume new tasks. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,426,579 RAC: 6,651 ![]() |
|
![]() Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 ![]() ![]() |
I have a CMS task that has presumably been at idle for 12 hours. I was going to abort it, but it finished while writing this post. https://lhcathome.cern.ch/lhcathome/result.php?resultid=240148745 |
Send message Joined: 18 Dec 15 Posts: 1843 Credit: 126,579,016 RAC: 128,156 ![]() ![]() ![]() |
since a few hours ago, all CMS tasks fail with: 207 (0x000000CF) EXIT_NO_SUB_TASKS |
©2025 CERN