Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,400,330 RAC: 102,290 |
Tasks do nothing and end with error after 15-20 minutes.which is typical for "no sub-tasks available" |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
Tasks do nothing and end with error after 15-20 minutes.which is typical for "no sub-tasks available" Yes, sorry about that. We seem to have a problem with queued jobs: there are two queues for each batch submitted, "queued" and "pending", both of which I believe have a 2,000-job limit. When I submit a batch, recently of 10,000 jobs each, jobs are created if necessary up to that number, to fill the "queued" queue; contemporaneously, jobs are moved from "queued" to "pending" until it also is full. In recent weeks, apparently since there was an interruption to the DNS service at CERN, there seems to have been a disruption in taking jobs from the "pending" queue and allocating them to worker machines -- only a small fraction get sent. What happened last night was that the current batch's "queued" queue drained, and job allocation from the "pending" jobs dropped off (there are currently 1200 pending and 13 running; in the previous batch 232 are still pending and 36 running!). At CMS IT's suggestion, I've been playing around with batch priority but that's had no perceptible effect. I'll have to just make sure that I submit new batches before the "queued" queue drains -- there's a new batch on its way so things should pick up again soon. I'll contact CERN again and suggest they restart the Condor scheduler. |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,958,903 RAC: 136,893 |
Thanks. It's running fine again. |
Send message Joined: 17 Sep 04 Posts: 99 Credit: 30,619,341 RAC: 3,808 |
Tasks do nothing and end with error after 15-20 minutes.which is typical for "no sub-tasks available" At what point will CERN be able to generate CMS jobs themselves, so you would not be required to submit batches? Regards, Bob P. |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
|
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,958,903 RAC: 136,893 |
All tasks are failing with EXIT_NO_SUB_TASKS since 6:30 UTC this morning. Just to make you aware. |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,297,026 RAC: 20,897 |
Please, let us know, when we can run CMS again. Thank you. |
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,503,029 RAC: 3,972 |
At what point will CERN be able to generate CMS jobs themselves, so you would not be required to submit batches? As you know we run version v49.00 over at LHC-dev (Average computing 58 GigaFLOPS) here (Average computing 312 GigaFLOPS) via Windows OS and the only problem I have had the last few days is that they need to get to HTCondor ping 0 in 15 minutes or less or they end up as one of the many different computer errors.......but once they do get running they work fine even with my ISP bird bath on a pole (satellite dish that runs like a 1995 dialup) As far as the goal-line is it because we use the NFL version with a cross bar and uprights? If it is maybe we should switch to that other version where they kick a round ball into a net (or maybe a FT line just to make it easier) (ok I better get off here so that bird bath can d/l the new tasks in the near future) |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,958,903 RAC: 136,893 |
Works fine again since late morning. |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
|
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,297,026 RAC: 20,897 |
All tasks are failing with EXIT_NO_SUB_TASKS again? |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Maybe related to this!? https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5199 Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,297,026 RAC: 20,897 |
|
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,400,330 RAC: 102,290 |
207 (0x000000CF) EXIT_NO_SUB_TASKS since about last midnight :-( |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,400,330 RAC: 102,290 |
... since about last midnight :-(what is also strange: was there not deployed a mechanism some time ago, which would stop sending out tasks once there is a problem with sub-tasks? This obviously did not work this time :-( |
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,503,029 RAC: 3,972 |
We are having the same problem with these over at the CMS-dev (well except mine are just giving me Exit status 1 (0x00000001) Unknown error code) I suspended mine but I see 207 (0x000000CF) EXIT_NO_SUB_TASKS running there It is friday so I hope we get this fixed before the end of the day. |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
We are having the same problem with these over at the CMS-dev There was a problem with the Oracle databases at CERN overnight, which stopped job submission. According to https://cern.service-now.com/service-portal/view-outage.do?n=OTG0053449 (if you can reach it) a workaround has been implemented. One of my machines is running tasks but still not getting jobs. Probably best to set No New Tasks until we can verify everything is working again. [Added] Our WMAgent is down, with a database-connect error. I'll ask Alan to tickle it. [/Added] |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
|
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,503,029 RAC: 3,972 |
I just gave one a try and still no luck and I don't see any new Valids anywhere Ivan. (I will give it another try later) |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,400,330 RAC: 102,290 |
I'm having a few ones running for about 4 hours now, so far successful. |
©2024 CERN