1) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 41415)
Posted 9 hours ago by ivan
Post:
EXIT_NO_SUB_TASKS again :-(
when I remember how often this has happened in the past - CMS is definitely not a well working sub-project :-(

and once more my question: what happened to Ivan? Does no-one know?

Ivan is not having a very good personal life at the moment. :-(
I'm keeping jobs available, and I'm aware of the dips in the jobs graphs (plus you may have noticed a network interruption at CERN last week). These are due to HTCondor not matching worker machines to job requirements. There are some indications that it depends on available memory, but that can't be the full story because my machines stop getting jobs during the lulls but then build up again afterwards. I'm trying to get a condor expert to look at it.
Sorry for your frustration, I feel it too but I have deeper problems to worry about at the moment.
2) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40866)
Posted 9 Dec 2019 by ivan
Post:
Ah, I think I've found the reason. I'd been playing around with priorities to try to get around the problem we had with condor requests timing out, so all my recent jobs have been submitted with priority 1000. Federica's batches were submitted with the original template value of 600000(!). I submitted another batch at priority 100000 and it's appeared on WMStats, so it looks like the others I have sent are not being acted upon while the current batch is still running at the same priority.
3) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40863)
Posted 9 Dec 2019 by ivan
Post:
@Ivan
Just noticed at the Grafana pages that the number of running CMS jobs has doubled since Sunday afternoon.
Might be that we need a new batch earlier than expected.

Yeah, I've seen that too. I have a batch in the pipeline that's not showing up in WMStats yet. Federica submitted two small tasks last week that appear to have run according to WMStats but I can't find any output in store -- ah, the unmerged result files are on DataBridge, I must be looking in the wrong place on EOS. I've just put in another batch that's not showing up yet either even though the submission is reported as successful. I'll have to double-check my input parameters.
4) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40728)
Posted 29 Nov 2019 by ivan
Post:
Thanks.

Got 1 task that started fine.
What factor do you expect regarding the runtime increase per job?

I've gone from 5,000 to 10,000 events per job. Given startup overhead, it should be less than a factor of two (the result file should be approx twice as big, too). Let me know if it causes any problems. It'll take a while for them to show up, there are 1,000 of the previous size to get through first.
5) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40723)
Posted 29 Nov 2019 by ivan
Post:
OK, thanks to great efforts by the CMS & CERN IT teams, a workaround is in place and we are able to run jobs again! I've submitted a small batch and have jobs running on my boxen. I'll submit a larger batch later, and take the opportunity to increase the job size as the average run-time is less than I would prefer. This should increase our efficiency.
6) Message boards : News : CMS@Home disruption this week (Message 40722)
Posted 29 Nov 2019 by ivan
Post:
OK, thanks to great efforts by the CMS & CERN IT teams, a workaround is in place and we are able to run jobs again! I've submitted a small batch and have jobs running on my boxen. I'll submit a larger batch later, and take the opportunity to increase the job size as the average run-time is less than I would prefer. This should increase our efficiency.
7) Message boards : News : CMS@Home disruption this week (Message 40700)
Posted 27 Nov 2019 by ivan
Post:
It appears that a database intervention at CERN went badly, leaving our data tables empty and us not being able to submit new CMS@Home jobs. Advice is that it will take several days to recover -- and as well as that some of the major players are in the USA, which has holidays for the rest of this week. I'll keep an eye on it, but I'm doubtful we'll be running again this week. Sorry 'bout that!
Happy Thanksgiving...
8) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40699)
Posted 27 Nov 2019 by ivan
Post:
Big problem, I'm afraid -- the database tables appear to be empty!
The problem cannot be fixed quickly. If the tables are empty now, the only option is to repeat the import. And this will take few days as the amount of data to be copied is huge and the tables do not have partitions so it is impossible to parallelise the work.
Also the IO subsystem for the integration databases is not as fast as the production databases.....
An alternative to all this import/export would be needed in the near future....

...and Thursday and Friday are US holidays...
9) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40687)
Posted 26 Nov 2019 by ivan
Post:
Probably need to wait a bit longer. Some difficulties were ironed out today, but not all. I have to go home soon; the North American contingent may sort it out overnight.
10) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40674)
Posted 26 Nov 2019 by ivan
Post:
Apparently a database copy went wrong, complicated by the network problems Thursday night, people working in different time-zones, and the weekend -- plus some lack of communication which led to effort being wasted over the weekend. I'm still waiting on further information from the service ticket (unfortunately this one is CERN internal so most of you won't be able to read it).
11) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40628)
Posted 24 Nov 2019 by ivan
Post:
The new batch on the production server doesn't seem to have created any jobs, so obviously it's not sending any out. We are running down the existing queues on the testbed server at reduced efficiency (i.e. not every job request is met within ten minutes). I'll send out more messages, but it is Sunday...
12) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40611)
Posted 23 Nov 2019 by ivan
Post:
Ah, I had an over-ride switch in my submission script that still pointed to the testbed server. Changed that and the submission went through. Now I'm waiting to see if the workflow actually shows up on the production server.

OK, I see the workflow on the production server now but I may have to go home before it starts farming out any jobs. Digits cruciate...
13) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40610)
Posted 23 Nov 2019 by ivan
Post:
Ah, I had an over-ride switch in my submission script that still pointed to the testbed server. Changed that and the submission went through. Now I'm waiting to see if the workflow actually shows up on the production server.
14) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40609)
Posted 23 Nov 2019 by ivan
Post:
We're working on it -- it's a prolongation of Thursday night's problem. First attempt at using a different server failed -- by the looks of it because my public key is not registered on the production server. I've sent CERN the corresponding key.
15) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40608)
Posted 23 Nov 2019 by ivan
Post:
Oops! I'm getting an error trying to submit new jobs to the queues. Jobs will run out late tonight if we can't get it sorted. I'm setting my machines to No New Tasks to try to lessen the load.
16) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40589)
Posted 22 Nov 2019 by ivan
Post:
OK, we've tickled the tiger's tail and jobs are available again. Have at it!
17) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40587)
Posted 22 Nov 2019 by ivan
Post:
We are having the same problem with these over at the CMS-dev
(well except mine are just giving me Exit status 1 (0x00000001) Unknown error code)

I suspended mine but I see 207 (0x000000CF) EXIT_NO_SUB_TASKS running there

It is friday so I hope we get this fixed before the end of the day.

There was a problem with the Oracle databases at CERN overnight, which stopped job submission. According to https://cern.service-now.com/service-portal/view-outage.do?n=OTG0053449 (if you can reach it) a workaround has been implemented. One of my machines is running tasks but still not getting jobs. Probably best to set No New Tasks until we can verify everything is working again.
[Added] Our WMAgent is down, with a database-connect error. I'll ask Alan to tickle it. [/Added]
18) Message boards : News : CMS job shortage Wednesday 13th November (Message 40441)
Posted 14 Nov 2019 by ivan
Post:
Should be up again soon -- just waiting on a database update.

OK, jobs are available now, so you can start running CMS tasks again.
19) Message boards : News : CMS job shortage Wednesday 13th November (Message 40440)
Posted 14 Nov 2019 by ivan
Post:
Should be up again soon -- just waiting on a database update.
20) Message boards : News : CMS job shortage Wednesday 13th November (Message 40435)
Posted 13 Nov 2019 by ivan
Post:
I haven't heard whether the intervention's over, and I'll be without network connectivity until tomorrow, so no jobs tonight, I'm afraid.


Next 20


©2020 CERN