Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
Sorry, there was a large increase in the number of jobs being run so the queue drained a couple of hours before I could get in to work this morning. Should be OK again in 20 minutes or so. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,609,995 RAC: 15,775 |
Sorry, there was a large increase in the number of jobs being run so the queue drained a couple of hours before I could get in to work this morning. Should be OK again in 20 minutes or so. That may have been because during the weekend other subprojects stopped sending tasks. So volunteers may have pointed their computers to CMS as this didn't suffer this problem. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
|
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,069,741 RAC: 127,020 |
The huge rate of "No tasks are available for CMS Simulation" might not be meant with "...trying out some new workflows, so you might see some strange effects...". ?? |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
The huge rate of "No tasks are available for CMS Simulation" might not be meant with "...trying out some new workflows, so you might see some strange effects...". Hmm, I'm getting tasks, but the jobs are failing (which is what we are investigating). There's a big spike in the job failure rate but still some "old" jobs working their way through the system. When they are eliminated we'll see if one theory for the new ones failing is right. If not, I'll have to put in a new batch of "old" jobs before the long weekend. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
OK, we're giving up on the tests for the long weekend. I've submitted a batch of the "old" jobs so you should start getting proper Pythia jobs again soon. Some of the test jobs will still hang around for a while but as far as I can tell they fail quickly but don't count as a BOINC task error so the task continues to run more jobs and clock up CPU credit. |
Send message Joined: 27 Sep 08 Posts: 807 Credit: 651,832,230 RAC: 291,503 |
Hi Ivan, I think the CMS tasks have run out, the computer running CMS are showing low CPU use even though they have CMS tasks running in BOINC. |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,069,741 RAC: 127,020 |
Yesterday evening the very short CMS job runtimes caused nearly 300000 internet requests/hour instead of the usual 40000 requests/hour. No problem if you run a local proxy but longer jobs would be more efficient. Since this morning CMS has no work. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
Yesterday evening the very short CMS job runtimes caused nearly 300000 internet requests/hour instead of the usual 40000 requests/hour. I've submitted longer jobs now but there is still a small backlog of the small ones that I put in as a stop-gap while I investigated the run times and file sizes. The bigger jobs should start running some time tonight. They have 5,000 events and will take O(5,000 seconds) (there is no filtering on this workflow so all events are returned). File sizes should be about 60 MB. Please let me know of any other difficulties this workflow causes. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
CMS IT want to upgrade the WMAgent software used to submit CMS@Home jobs, so we need to drain the queues. My estimate is that the current batch will finish late on Saturday, so as the queues become empty I will submit a smaller batch calculated to finish early on Monday. Hopefully this will let them complete the work as early as possible. Please start setting No New Tasks for CMS@Home on Sunday, to minimise the problems with time-outs, etc., as jobs become scarce. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
CMS IT want to upgrade the WMAgent software used to submit CMS@Home jobs, so we need to drain the queues. My estimate is that the current batch will finish late on Saturday, so as the queues become empty I will submit a smaller batch calculated to finish early on Monday. Hopefully this will let them complete the work as early as possible. I've just submitted another 4,000 jobs, and there are 600 still in the current batch. As we are averaging about 100/hour I anticipate that we'll start running dry around mid-day (European time) on Monday. So, set No New Tasks before about midnight Sunday. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
|
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
|
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,069,741 RAC: 127,020 |
Since this morning the job queue seems to be empty. |
Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,745,762 RAC: 10,798 |
Since this morning the job queue seems to be empty. Same over at -dev I just sent a message over to Ivan and suspended mine until we can get this fixed. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
Since this morning the job queue seems to be empty. Hmm, you are right. There are still jobs pending but the number running has dropped away. I did submit a new batch this morning but it won't show up on my monitor until other tasks have completed. I'll investigate. Later: There is a DBStatus error in the WMAgent, I'll alert CERN. Exception Class: DBSUploadException Message: Unknown failure while fetching parentage map from WMStats. Error: (6, 'Could not resolve: cmsweb-testbed.cern.ch (Timeout while contacting DNS servers)') |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,044 RAC: 436 |
|
Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,745,762 RAC: 10,798 |
OK we are officially back to work here again. Tested and running Valids |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,069,741 RAC: 127,020 |
At the grafana page the #running CMS jobs dropped from ~250 to ~40 last night. In addition my hosts show lots of EXIT_NO_SUB_TASKS errors. |
Send message Joined: 18 Nov 17 Posts: 120 Credit: 51,965,106 RAC: 25,063 |
Since this morning it seems like VM's do not start. Tasks do nothing and end with error after 15-20 minutes. |
©2024 CERN