Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · Next
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1157 Credit: 7,118,800 RAC: 2,303 ![]() ![]() ![]() |
during last night, Theory ran out of jobs, and after some time - which was good - the download of new tasks was stopped automatically. Theory? |
Send message Joined: 18 Dec 15 Posts: 1562 Credit: 58,098,667 RAC: 46,945 ![]() ![]() ![]() |
oh sorry, it should read "CMS" of course. Unfortunately, I cannot edit my original posting any more.during last night, Theory ran out of jobs, and after some time - which was good - the download of new tasks was stopped automatically. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
Yeah, sorry. I submitted a workflow at the weekend that hung in submission; I sent another which worked. Then the "failed" job showed up as "new" so I thought it was OK -- turned out it wasn't. I've submitted another batch, and tried setting the "new" batch to "approved". Hopefully things will be up again in 20-30 minutes. ![]() |
Send message Joined: 27 Sep 08 Posts: 745 Credit: 557,956,721 RAC: 328,821 ![]() ![]() ![]() |
Hi Ivan, Run out again on the back end./ |
Send message Joined: 2 May 07 Posts: 1728 Credit: 130,426,480 RAC: 275,790 ![]() ![]() ![]() |
|
Send message Joined: 27 Sep 08 Posts: 745 Credit: 557,956,721 RAC: 328,821 ![]() ![]() ![]() |
I assume this was the leading edge of the errors that I had. I have 11 running now so Ivan probably fixed it. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
There's another release of WMAgent waiting to be installed. Since the current batch of jobs will end sometime Saturday, I'll pop in a smaller workflow after that, designed to run out early Monday, to give the WMCore team the chance to get the update done. So, be ready for jobs to start becoming unavailable late Sunday night. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
There's another release of WMAgent waiting to be installed. Since the current batch of jobs will end sometime Saturday, I'll pop in a smaller workflow after that, designed to run out early Monday, to give the WMCore team the chance to get the update done. So, be ready for jobs to start becoming unavailable late Sunday night. Last batch is running now. I'd estimated it to end around 0800 UTC, but there's been some disruption in the pipeline (network?) today, so not as many jobs are running/completing as yesterday. This may not necessarily be bad as the main person doing the upgrade has just relocated to Notre Dame and is in a rather later time-zone. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
|
Send message Joined: 27 Sep 08 Posts: 745 Credit: 557,956,721 RAC: 328,821 ![]() ![]() ![]() |
No worries, I set NNT for CMS, so its working though the 300 or so in the work buffer. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2147 Credit: 175,752,326 RAC: 109,424 ![]() ![]() ![]() |
There are no tasks in the BOINC server's queue: https://lhcathome.cern.ch/lhcathome/server_status.php |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2147 Credit: 175,752,326 RAC: 109,424 ![]() ![]() ![]() |
According to Grafana the subtask queue is empty since 5:12 UTC. |
Send message Joined: 14 Jan 10 Posts: 1157 Credit: 7,118,800 RAC: 2,303 ![]() ![]() ![]() |
No sub-tasks (jobs) available. 2022-02-04 07:53:15 (9436): Guest Log: [INFO] CMS application starting. Check log files. 2022-02-04 08:18:55 (9436): Guest Log: [INFO] glidein exited with return value 0. 2022-02-04 08:18:55 (9436): Guest Log: [INFO] Shutting Down. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
No sub-tasks (jobs) available. Yes, the latest batch (20,000 events) aren't matching the condor criteria (there are 2,000 jobs in the pending queue). It'll take me an hour or two to get into work and find out why. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,104,659 RAC: 976 ![]() |
No sub-tasks (jobs) available. I had to resort to asking my Italian colleague to work this out for me. It appears that there was a time-out set at 15.98 hours (haven't worked out why yet) and the jobs were requesting 16 hours! So, the jobs didn't start. I've reverted to 10,000 events ("two-hour" jobs) while I ponder the implications of this. ![]() |
Send message Joined: 27 Sep 08 Posts: 745 Credit: 557,956,721 RAC: 328,821 ![]() ![]() ![]() |
Hi Ivan, I though this was set by Lawrence? the runtime should be 12 h, but there is a timeout at 16 h so they don't run forever if there is an issue? |
©2023 CERN