Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,877,206
RAC: 809
Message 45000 - Posted: 25 May 2021, 5:32:34 UTC - in response to Message 44999.  

during last night, Theory ran out of jobs, and after some time - which was good - the download of new tasks was stopped automatically.

Hope that Ivan can do something this morning :-)

Theory?
ID: 45000 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1514
Credit: 44,334,173
RAC: 47,949
Message 45001 - Posted: 25 May 2021, 6:03:32 UTC - in response to Message 45000.  

during last night, Theory ran out of jobs, and after some time - which was good - the download of new tasks was stopped automatically.

Hope that Ivan can do something this morning :-)

Theory?
oh sorry, it should read "CMS" of course. Unfortunately, I cannot edit my original posting any more.
ID: 45001 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 45002 - Posted: 25 May 2021, 7:38:58 UTC - in response to Message 45000.  

Yeah, sorry. I submitted a workflow at the weekend that hung in submission; I sent another which worked. Then the "failed" job showed up as "new" so I thought it was OK -- turned out it wasn't. I've submitted another batch, and tried setting the "new" batch to "approved". Hopefully things will be up again in 20-30 minutes.
ID: 45002 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 727
Credit: 482,349,911
RAC: 285,472
Message 45154 - Posted: 23 Jul 2021, 6:04:38 UTC

Hi Ivan,

Run out again on the back end./
ID: 45154 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1557
Credit: 57,842,221
RAC: 205,592
Message 45155 - Posted: 23 Jul 2021, 6:39:34 UTC

ID: 45155 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 727
Credit: 482,349,911
RAC: 285,472
Message 45157 - Posted: 23 Jul 2021, 15:45:59 UTC - in response to Message 45155.  

I assume this was the leading edge of the errors that I had.

I have 11 running now so Ivan probably fixed it.
ID: 45157 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46117 - Posted: 27 Jan 2022, 19:07:34 UTC
Last modified: 27 Jan 2022, 19:08:24 UTC

There's another release of WMAgent waiting to be installed. Since the current batch of jobs will end sometime Saturday, I'll pop in a smaller workflow after that, designed to run out early Monday, to give the WMCore team the chance to get the update done. So, be ready for jobs to start becoming unavailable late Sunday night.
ID: 46117 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46126 - Posted: 30 Jan 2022, 20:41:32 UTC - in response to Message 46117.  

There's another release of WMAgent waiting to be installed. Since the current batch of jobs will end sometime Saturday, I'll pop in a smaller workflow after that, designed to run out early Monday, to give the WMCore team the chance to get the update done. So, be ready for jobs to start becoming unavailable late Sunday night.

Last batch is running now. I'd estimated it to end around 0800 UTC, but there's been some disruption in the pipeline (network?) today, so not as many jobs are running/completing as yesterday. This may not necessarily be bad as the main person doing the upgrade has just relocated to Notre Dame and is in a rather later time-zone.
ID: 46126 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46130 - Posted: 31 Jan 2022, 13:30:30 UTC

Job queue is now exhausted, so you will be seeing more EXIT_NO_SUB_TASKS messages now.
ID: 46130 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46137 - Posted: 1 Feb 2022, 16:57:55 UTC - in response to Message 46130.  

It seems there is some problem with the WMAgent update, but I've not received any news about it yet.
ID: 46137 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 727
Credit: 482,349,911
RAC: 285,472
Message 46139 - Posted: 1 Feb 2022, 18:13:21 UTC - in response to Message 46137.  

No worries, I set NNT for CMS, so its working though the 300 or so in the work buffer.
ID: 46139 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46141 - Posted: 1 Feb 2022, 18:53:35 UTC
Last modified: 1 Feb 2022, 18:53:46 UTC

OK, the agent is up again, so new tasks will arrive soon. The first two or three batches will be different from usual, so don't freak out! But, do tell me if they create significant problems.
ID: 46141 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46143 - Posted: 1 Feb 2022, 21:07:17 UTC - in response to Message 46141.  

OK, the agent is up again, so new tasks will arrive soon. The first two or three batches will be different from usual, so don't freak out! But, do tell me if they create significant problems.

The first test is over, thank you. Now I try to run jobs twice as large as in the past two years.
ID: 46143 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1993
Credit: 143,968,819
RAC: 100,611
Message 46144 - Posted: 2 Feb 2022, 7:20:41 UTC

There are no tasks in the BOINC server's queue:
https://lhcathome.cern.ch/lhcathome/server_status.php
ID: 46144 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46146 - Posted: 2 Feb 2022, 11:29:03 UTC - in response to Message 46144.  

There was a glitch restarting the WMAgent, and we were limited to 100 running jobs. It's fixed now and the number of active jobs is increasing.
ID: 46146 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1993
Credit: 143,968,819
RAC: 100,611
Message 46170 - Posted: 4 Feb 2022, 7:22:19 UTC

According to Grafana the subtask queue is empty since 5:12 UTC.
ID: 46170 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,877,206
RAC: 809
Message 46171 - Posted: 4 Feb 2022, 7:23:09 UTC

No sub-tasks (jobs) available.

2022-02-04 07:53:15 (9436): Guest Log: [INFO] CMS application starting. Check log files.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] glidein exited with return value 0.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] Shutting Down.
ID: 46171 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46173 - Posted: 4 Feb 2022, 9:50:49 UTC - in response to Message 46171.  
Last modified: 4 Feb 2022, 9:51:44 UTC

No sub-tasks (jobs) available.

2022-02-04 07:53:15 (9436): Guest Log: [INFO] CMS application starting. Check log files.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] glidein exited with return value 0.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] Shutting Down.

Yes, the latest batch (20,000 events) aren't matching the condor criteria (there are 2,000 jobs in the pending queue). It'll take me an hour or two to get into work and find out why.
ID: 46173 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 881
Credit: 5,850,679
RAC: 242
Message 46177 - Posted: 4 Feb 2022, 14:34:04 UTC - in response to Message 46173.  

No sub-tasks (jobs) available.

2022-02-04 07:53:15 (9436): Guest Log: [INFO] CMS application starting. Check log files.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] glidein exited with return value 0.
2022-02-04 08:18:55 (9436): Guest Log: [INFO] Shutting Down.

Yes, the latest batch (20,000 events) aren't matching the condor criteria (there are 2,000 jobs in the pending queue). It'll take me an hour or two to get into work and find out why.

I had to resort to asking my Italian colleague to work this out for me. It appears that there was a time-out set at 15.98 hours (haven't worked out why yet) and the jobs were requesting 16 hours! So, the jobs didn't start.
I've reverted to 10,000 events ("two-hour" jobs) while I ponder the implications of this.
ID: 46177 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 727
Credit: 482,349,911
RAC: 285,472
Message 46220 - Posted: 10 Feb 2022, 20:27:53 UTC - in response to Message 46177.  

Hi Ivan,

I though this was set by Lawrence? the runtime should be 12 h, but there is a timeout at 16 h so they don't run forever if there is an issue?
ID: 46220 · Report as offensive     Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2022 CERN