Message boards :
CMS Application :
no new WUs available
Message board moderation
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · Next
Author | Message |
---|---|
Send message Joined: 17 Sep 04 Posts: 105 Credit: 32,824,862 RAC: 59 |
Is the system setup to provide continuous work without human Intervention? I thought that was the goal? Thanks. Regards, Bob P. |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
Is the system setup to provide continuous work without human Intervention? I thought that was the goal? Not at the moment, although ultimately we would like to do that. There are so many variables to keep track of and to assess that it's hard to just let the system run on autopilot. As we have seen in the past weeks, there are so many steps in the chain of getting from a job specification to a body of results in storage that interruption to any one ripples down to a break in service. We also need to be able to plan ahead for any interruptions (like the periodic updates to WMAgent code) so we don't want to build up a backlog of jobs that would take weeks to clear, and thus I try to only commit a few days' worth of jobs at a time. Naturally, I sometimes get caught out on this -- sleeping in too late on a Sunday, for example. [Hmm, we've got the Lionesses playing Ireland in an hour or so, and the Lions vs. España Sunday night; I'd better check my plans for this weekend!] |
Send message Joined: 24 Oct 04 Posts: 1177 Credit: 54,887,670 RAC: 3,877 |
How is this possible running 4 core CMS tasks hundreds of times? I was just looking around to see how they are running here. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,373,095 RAC: 13,741 |
Ready to Send queue is dropping, we are down to 90 tasks at the moment. Minor hick-up or running out? Running jobs are still at about 330. |
Send message Joined: 3 Nov 12 Posts: 59 Credit: 142,193,076 RAC: 32,238 |
How is this possible running 4 core CMS tasks hundreds of times? 1100 credits for 2minutes of cpu. One of the best invests I'v ever seen. And this without scientific benefits. A typical "thanks for nothing". Or just waste of time... |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
Ready to Send queue is dropping, we are down to 90 tasks at the moment. Minor hick-up or running out? Running jobs are still at about 330. That was probably because I'd not got the next workflow into the queue before the current one started to eat into its backlog. We've been seeing an increase in the number of running jobs this week (from ~240 up to ~500) so I've started submitting 5,000 jobs/batch instead of 2,000, to give me a better chance of catching workflows before they run out. We also had an anomaly last weekend due to one host with a network problem -- it burnt through about 600 jobs due to not being able to connect to the frontier (conditions database) servers. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,983,735 RAC: 18,277 |
Ivan, right now the situation is that jobs are available, but no tasks are available to process the jobs :-) |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,983,735 RAC: 18,277 |
Ivan,edit: meanwhile, no jobs either |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
|
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
|
Send message Joined: 4 Sep 22 Posts: 92 Credit: 16,008,656 RAC: 8,102 |
There are new tasks, yes, but so far they all seem to be more of those do-nothing tasks. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 23,290 |
Got fresh tasks/jobs and so far all of them are running fine. |
Send message Joined: 4 Sep 22 Posts: 92 Credit: 16,008,656 RAC: 8,102 |
Got fresh tasks/jobs and so far all of them are running fine. I was premature with my comment about all of them being do-nothing tasks. There were only a few at first (likely old stuff being sent out again for whatever reason), and then the "real" tasks started up here --- then I got busy doing other stuff, and have only got back here now to report. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 852 |
The CMS patch activated last night affects the process inside the VM.Sometimes coïncidences happen, but not getting CMS-tasks, although enough tasks in queue has happened even with no work on that BOINC instance al all. Since the queue went down to zero and Ivan refilled the batch with jobs, causing BOINC to create new tasks, I finally got a CMS-task. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10793807 |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
The WMAgent died (why does this always happen on a weekend?) because of a miscommunication between the agent and the condor server. This meant no new jobs were injected into the condor pool, so it drained of jobs as they finished and there were no fresh jobs to be allocated. This meant that the BOINC server was seeing no new jobs in the pool, so it stopped allocating new tasks to Volunteer machines. When I notified CERN of the problem, the affected WMAgent component was restarted, and jobs from the current (unfinished) workflow were again submitted to the pool. When the BOINC server next checked the job pool it saw available jobs again and so restarted Task creation. The new tasks (VMs in host machines) were then able to acquire jobs and restart processing. |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
|
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 201 |
We've run into a problem with the WMAgent and need to drain the queues. Fortunately we are near the end of a batch of jobs, so we should be able to apply the fix tomorrow. Take the usual countermeasures to cope with a short period of job unavailability starting in 12 hours or so. Intervention is over and jobs are available in the pool -- 18 have already been picked up by volunteer hosts. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,373,095 RAC: 13,741 |
We have run out of CMS work. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,983,735 RAC: 18,277 |
the queue has run dry |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 307 |
the queue has run dry |
©2024 CERN