Message boards : CMS Application : no new WUs available
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33202 - Posted: 4 Dec 2017, 10:56:38 UTC - in response to Message 33201.  
Last modified: 4 Dec 2017, 10:58:12 UTC

hm, although the Project Status Page still shows zero tasks available, I have received some about half an hour ago.
So there are tasks around again. However, they are still waiting in the queue here, so I cannot tell whether there will be jobs for these tasks.

Is the information from the Project Status Page lagging behind to some extent?
ID: 33202 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33203 - Posted: 4 Dec 2017, 11:54:57 UTC - in response to Message 33202.  

hm, although the Project Status Page still shows zero tasks available, I have received some about half an hour ago.
So there are tasks around again. However, they are still waiting in the queue here, so I cannot tell whether there will be jobs for these tasks.

Is the information from the Project Status Page lagging behind to some extent?

Yes, it's not updated continuously; the time of the snapshot is at the bottom of the page. It's currently saying 1049 GMT, which is an hour out of date.
I seem to be picking up jobs, but my computers are a bit mixed up because of the long SETI@Home drought that has just ended.
ID: 33203 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33204 - Posted: 4 Dec 2017, 12:02:52 UTC
Last modified: 4 Dec 2017, 12:03:05 UTC

thanks for the Information, Ivan.
I never noticed the time stamp at the bottom of the Project Status Page (shame on me) :-(

Right now it says "11:53:51 UTC" and shows 114 tasks available. So everything is okay again :-)
ID: 33204 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33205 - Posted: 4 Dec 2017, 13:05:37 UTC - in response to Message 33204.  

the number of tasks available is dropping again.
Right now (13:01 hrs UTC) it's at 33.
ID: 33205 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33207 - Posted: 4 Dec 2017, 15:59:37 UTC

the number of available tasks is back to zero since a couple of hours ago.
I guess there won't be a solution to the problem before tomorrow (at the earliest)?
ID: 33207 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33208 - Posted: 4 Dec 2017, 16:04:13 UTC - in response to Message 33207.  

the number of available tasks is back to zero since a couple of hours ago.
I guess there won't be a solution to the problem before tomorrow (at the earliest)?

I've submitted a small batch of jobs to the "old" WMAgent, and they are just starting to run. These should go out to volunteer machines, so let's see if the situation changes.
ID: 33208 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33259 - Posted: 10 Dec 2017, 11:45:23 UTC

Looks like the WMAgent failed about 0830 this morning, and the job queue drained 90 minutes later. So, we are out of jobs at the moment -- I'd actually submitted a new batch of jobs before I'd tracked it down to the WMAgent. :-/ So best to set No New Tasks until CERN reacts to my e-mail and kicks vocms0267 into life again. Just to mention at this point that we might see more outages over the next month as various year's-end maintenance programmes take place, as well as the whole CERN site shutting down for about two weeks for holidays.
Good(?) news -- it snowed in London overnight! That doesn't happen often.
ID: 33259 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 983
Credit: 42,902,040
RAC: 29,990
Message 33260 - Posted: 10 Dec 2017, 12:03:20 UTC

2017-12-10 03:34:29 (5236): Guest Log: [INFO] CMS application starting. Check log files.

2017-12-10 03:35:25 (5236): Guest Log: [DEBUG] HTCondor ping

2017-12-10 03:35:32 (5236): Guest Log: [DEBUG] 0

2017-12-10 03:47:35 (5236): Guest Log: [ERROR] Condor exited after 729s without running a job.

2017-12-10 03:47:35 (5236): Guest Log: [INFO] Shutting Down.

2017-12-10 03:47:35 (5236): VM Completion File Detected.
2017-12-10 03:47:35 (5236): VM Completion Message: Condor exited after 729s without running a job.


4am here and over and over CMS is doing this here and over at -dev

.....and no I do not get up early to do this.........that server needs me to throw a few snow balls at it.
Volunteer Mad Scientist For Life
ID: 33260 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33261 - Posted: 10 Dec 2017, 16:41:50 UTC - in response to Message 33260.  

Yes, sorry for your frustration. I didn't get a reply from Alan, who is usually quite quick to respond, so I opened a ticket with CERN IT. I don't really expect a response from them until some time tomorrow, unfortunately.
Luckily I've still got S@H and Einstein to keep my server ticking over and generate a bit of heat. It kept snowing on-and-off today, so it was a very slushy shopping trip this afternoon.
ID: 33261 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 33262 - Posted: 10 Dec 2017, 16:59:48 UTC - in response to Message 33261.  
Last modified: 10 Dec 2017, 17:01:25 UTC

...so I opened a ticket with CERN IT. I don't really expect a response from them until some time tomorrow, unfortunately.

Sounds like a good time to open the spigot and run a few more Theory jobs... :D
ID: 33262 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33264 - Posted: 10 Dec 2017, 18:21:25 UTC - in response to Message 33262.  

...so I opened a ticket with CERN IT. I don't really expect a response from them until some time tomorrow, unfortunately.

Sounds like a good time to open the spigot and run a few more Theory jobs... :D

Yeah, back-up projects time. Stay warm, everyone (they're forecasting -12 C for areas of the UK for the next two nights!).
ID: 33264 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 983
Credit: 42,902,040
RAC: 29,990
Message 33265 - Posted: 11 Dec 2017, 1:39:45 UTC - in response to Message 33261.  

Yes, sorry for your frustration. I didn't get a reply from Alan, who is usually quite quick to respond, so I opened a ticket with CERN IT. I don't really expect a response from them until some time tomorrow, unfortunately.
Luckily I've still got S@H and Einstein to keep my server ticking over and generate a bit of heat. It kept snowing on-and-off today, so it was a very slushy shopping trip this afternoon.


Funny I thought of your snow today watching an NFL game with the Colts @ Buffalo with a game in deep snow all over the stadium and the field was nothing but snow (ended up and overtime game and then players making *snow angels* and disappearing in the snow

Sort of cold here but still sunshine and I just switched all mine back to Theory tasks here and over at -dev and fired up another Einstein GPU machine that has been taking a break all year and running these VB tasks.

I like it when a crunching day ends and I have a long list of Valids
Volunteer Mad Scientist For Life
ID: 33265 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33266 - Posted: 11 Dec 2017, 6:13:24 UTC - in response to Message 33265.  

Why is it that out of all the LHC sub-projects, CMS is failing most frequently (at least, that's my Impression)?
Particularly since the installation of the new release of the WMAgent some months ago there are problems once and so often.
It's really too bad :-(
ID: 33266 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33267 - Posted: 11 Dec 2017, 11:50:27 UTC - in response to Message 33266.  

Ivan, what I don't understand is: why can CMS tasks still be downloaded (and consequently being started on the volunteer's PC, until they fail after a while), and why are there constantly close to 200 unsent tasks shown on the project status page, if it has been clear since one day ago that there will be no jobs coming in?

I may be mistaken now, but I think to remember to have read somewhere here, several months ago, that there was some automated steps established which stopped the creation of new CMS tasks some time after no jobs are available. Or am I wrong?
ID: 33267 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33268 - Posted: 11 Dec 2017, 13:38:29 UTC - in response to Message 33267.  

Ivan, what I don't understand is: why can CMS tasks still be downloaded (and consequently being started on the volunteer's PC, until they fail after a while), and why are there constantly close to 200 unsent tasks shown on the project status page, if it has been clear since one day ago that there will be no jobs coming in?

I may be mistaken now, but I think to remember to have read somewhere here, several months ago, that there was some automated steps established which stopped the creation of new CMS tasks some time after no jobs are available. Or am I wrong?

No, that was happening, at least here on LHC@home. It could be that Laurence needs to tweak the script to take account of the new WMAgent we had installed two weeks ago, or maybe it's something to do with the new website set-up. I'll make enquiries.
ID: 33268 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33269 - Posted: 11 Dec 2017, 15:17:17 UTC - in response to Message 33268.  

The problem was to do with the new web site. Task creation is being turned off manually for the mean-time.
ID: 33269 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 33270 - Posted: 11 Dec 2017, 15:32:21 UTC

CMS jobs have spiked back up and I just got a task. Are we back for good?
ID: 33270 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33271 - Posted: 11 Dec 2017, 16:32:14 UTC - in response to Message 33270.  

CMS jobs have spiked back up and I just got a task. Are we back for good?

No, not really. I submitted a small batch of jobs with the old WMAgent and they made it through to the Condor server just as we were about to manually disable task creation. There's no sign of the new agent coming back to life, so I guess I'll submit a bigger batch to keep us going until tomorrow.
ID: 33271 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 750
Credit: 5,688,626
RAC: 0
Message 33395 - Posted: 15 Dec 2017, 18:35:12 UTC - in response to Message 33271.  

CMS jobs have spiked back up and I just got a task. Are we back for good?

No, not really. I submitted a small batch of jobs with the old WMAgent and they made it through to the Condor server just as we were about to manually disable task creation. There's no sign of the new agent coming back to life, so I guess I'll submit a bigger batch to keep us going until tomorrow.

OK, we finally got in touch with the WMAgent expert (turns out he was on holidays, and so is our other expert!), and the new agent is back in operation again, so I can start submitting larger job batches again. Just as well, too, as our number of running jobs has taken an increase -- we nearly drained the queue this morning as I was expecting last night's batch to last 24 hours!
We also have another WMAgent expert "on our books" now, so hopefully we won't have this long an outage again. However, as I said before, it's the winter holiday season, so remedies may be slow to come in some circumstances.
ID: 33395 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1343
Credit: 25,070,670
RAC: 22,080
Message 33482 - Posted: 23 Dec 2017, 15:24:14 UTC

According to the Project Status Page, CMS has run out of tasks :-(
ID: 33482 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : CMS Application : no new WUs available


©2021 CERN