Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39650 - Posted: 19 Aug 2019, 9:47:41 UTC

Sorry, there was a large increase in the number of jobs being run so the queue drained a couple of hours before I could get in to work this morning. Should be OK again in 20 minutes or so.
ID: 39650 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 556
Credit: 30,653,876
RAC: 15,611
Message 39652 - Posted: 19 Aug 2019, 10:26:16 UTC - in response to Message 39650.  

Sorry, there was a large increase in the number of jobs being run so the queue drained a couple of hours before I could get in to work this morning. Should be OK again in 20 minutes or so.

That may have been because during the weekend other subprojects stopped sending tasks. So volunteers may have pointed their computers to CMS as this didn't suffer this problem.
ID: 39652 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39676 - Posted: 21 Aug 2019, 16:11:23 UTC

We're trying out some new workflows, so you might see some strange effects in the next few days. Feel free to set No New Tasks if it annoys you.
ID: 39676 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1863
Credit: 127,779,650
RAC: 91,991
Message 39698 - Posted: 23 Aug 2019, 10:02:23 UTC - in response to Message 39676.  

The huge rate of "No tasks are available for CMS Simulation" might not be meant with "...trying out some new workflows, so you might see some strange effects...".
??
ID: 39698 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39702 - Posted: 23 Aug 2019, 13:59:09 UTC - in response to Message 39698.  

The huge rate of "No tasks are available for CMS Simulation" might not be meant with "...trying out some new workflows, so you might see some strange effects...".
??

Hmm, I'm getting tasks, but the jobs are failing (which is what we are investigating). There's a big spike in the job failure rate but still some "old" jobs working their way through the system. When they are eliminated we'll see if one theory for the new ones failing is right. If not, I'll have to put in a new batch of "old" jobs before the long weekend.
ID: 39702 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39705 - Posted: 23 Aug 2019, 16:12:35 UTC

OK, we're giving up on the tests for the long weekend. I've submitted a batch of the "old" jobs so you should start getting proper Pythia jobs again soon. Some of the test jobs will still hang around for a while but as far as I can tell they fail quickly but don't count as a BOINC task error so the task continues to run more jobs and clock up CPU credit.
ID: 39705 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 699
Credit: 443,359,145
RAC: 194,144
Message 39878 - Posted: 9 Sep 2019, 16:22:54 UTC

Hi Ivan,

I think the CMS tasks have run out, the computer running CMS are showing low CPU use even though they have CMS tasks running in BOINC.
ID: 39878 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1863
Credit: 127,779,650
RAC: 91,991
Message 39885 - Posted: 10 Sep 2019, 5:29:06 UTC

Yesterday evening the very short CMS job runtimes caused nearly 300000 internet requests/hour instead of the usual 40000 requests/hour.
No problem if you run a local proxy but longer jobs would be more efficient.

Since this morning CMS has no work.
ID: 39885 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39894 - Posted: 10 Sep 2019, 16:04:22 UTC - in response to Message 39885.  

Yesterday evening the very short CMS job runtimes caused nearly 300000 internet requests/hour instead of the usual 40000 requests/hour.
No problem if you run a local proxy but longer jobs would be more efficient.

Since this morning CMS has no work.

I've submitted longer jobs now but there is still a small backlog of the small ones that I put in as a stop-gap while I investigated the run times and file sizes. The bigger jobs should start running some time tonight. They have 5,000 events and will take O(5,000 seconds) (there is no filtering on this workflow so all events are returned). File sizes should be about 60 MB.
Please let me know of any other difficulties this workflow causes.
ID: 39894 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39910 - Posted: 12 Sep 2019, 13:26:29 UTC
Last modified: 12 Sep 2019, 13:27:29 UTC

CMS IT want to upgrade the WMAgent software used to submit CMS@Home jobs, so we need to drain the queues. My estimate is that the current batch will finish late on Saturday, so as the queues become empty I will submit a smaller batch calculated to finish early on Monday. Hopefully this will let them complete the work as early as possible.
Please start setting No New Tasks for CMS@Home on Sunday, to minimise the problems with time-outs, etc., as jobs become scarce.
ID: 39910 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39925 - Posted: 14 Sep 2019, 14:24:58 UTC - in response to Message 39910.  

CMS IT want to upgrade the WMAgent software used to submit CMS@Home jobs, so we need to drain the queues. My estimate is that the current batch will finish late on Saturday, so as the queues become empty I will submit a smaller batch calculated to finish early on Monday. Hopefully this will let them complete the work as early as possible.
Please start setting No New Tasks for CMS@Home on Sunday, to minimise the problems with time-outs, etc., as jobs become scarce.

I've just submitted another 4,000 jobs, and there are 600 still in the current batch. As we are averaging about 100/hour I anticipate that we'll start running dry around mid-day (European time) on Monday. So, set No New Tasks before about midnight Sunday.
ID: 39925 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39949 - Posted: 17 Sep 2019, 13:18:17 UTC - in response to Message 39925.  

I had to pad out the queue overnight as the jobs took longer to run than I anticipated. We ran out earlier today. Alan has now deployed the new WMAgent version, and I am waiting for a new batch of jobs to make its way into the queue.
ID: 39949 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 39950 - Posted: 17 Sep 2019, 13:43:28 UTC

Jobs are available again...
ID: 39950 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1863
Credit: 127,779,650
RAC: 91,991
Message 40122 - Posted: 11 Oct 2019, 9:03:27 UTC

Since this morning the job queue seems to be empty.
ID: 40122 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1001
Credit: 46,149,684
RAC: 6,836
Message 40124 - Posted: 11 Oct 2019, 10:10:14 UTC - in response to Message 40122.  

Since this morning the job queue seems to be empty.


Same over at -dev

I just sent a message over to Ivan and suspended mine until we can get this fixed.
ID: 40124 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 40126 - Posted: 11 Oct 2019, 13:56:00 UTC - in response to Message 40124.  

Since this morning the job queue seems to be empty.


Same over at -dev

I just sent a message over to Ivan and suspended mine until we can get this fixed.

Hmm, you are right. There are still jobs pending but the number running has dropped away. I did submit a new batch this morning but it won't show up on my monitor until other tasks have completed. I'll investigate.
Later: There is a DBStatus error in the WMAgent, I'll alert CERN.

Exception Class: DBSUploadException
Message: Unknown failure while fetching parentage map from WMStats. Error: (6, 'Could not resolve: cmsweb-testbed.cern.ch (Timeout while contacting DNS servers)')

ID: 40126 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 817
Credit: 5,717,880
RAC: 330
Message 40135 - Posted: 12 Oct 2019, 15:58:19 UTC - in response to Message 40126.  

The batch I submitted yesterday may have been faulty. I sent another today which did show up on the status monitor, and is sending jobs. All my tasks are now receiving jobs. Older batches are only sending out jobs sporadically.
ID: 40135 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1001
Credit: 46,149,684
RAC: 6,836
Message 40142 - Posted: 13 Oct 2019, 7:53:22 UTC

OK we are officially back to work here again.
Tested and running Valids
ID: 40142 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1863
Credit: 127,779,650
RAC: 91,991
Message 40330 - Posted: 30 Oct 2019, 8:27:43 UTC

At the grafana page the #running CMS jobs dropped from ~250 to ~40 last night.
In addition my hosts show lots of EXIT_NO_SUB_TASKS errors.
ID: 40330 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 117
Credit: 38,007,863
RAC: 23,850
Message 40331 - Posted: 30 Oct 2019, 8:36:31 UTC - in response to Message 40330.  

Since this morning it seems like VM's do not start.
Tasks do nothing and end with error after 15-20 minutes.
ID: 40331 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2022 CERN