Message boards : CMS Application : Jobs drying up
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33852 - Posted: 14 Jan 2018, 15:53:49 UTC

Unfortunately, I'm not able to submit new jobs at the moment. I gather there was a system change on Friday that has somehow gone awry. I'm also told that it's a holiday in the US tomorrow so the expert involved may not be able to intervene until Tuesday.
We will be out of jobs within the hour. I'll keep trying to submit a new batch, but for now, better set No New Tasks while we ride out the problem. Sorry 'bout that... :-(
ID: 33852 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33853 - Posted: 14 Jan 2018, 16:00:44 UTC - in response to Message 33852.  

Whoops, I might have spoken too soon! It looks like the batch I submitted 20 minutes ago has got through to the WMAgent despite having sent an error 500 Internal Server Error. It's yet to start queueing jobs, though, so I'm holding my breath.
ID: 33853 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33854 - Posted: 14 Jan 2018, 17:36:44 UTC - in response to Message 33853.  

Still holding my breath -- 295 jobs left in the queue, none yet from the new batch. What a way to spend Sunday afternoon!
ID: 33854 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,979,908
RAC: 79,812
Message 33855 - Posted: 14 Jan 2018, 17:53:04 UTC - in response to Message 33854.  
Last modified: 14 Jan 2018, 18:00:02 UTC

Still holding my breath -- 295 jobs left in the queue, none yet from the new batch. What a way to spend Sunday afternoon!

Nothing different to watching your favourite football (or anything else) team on TV.
You cannot do anything to help them but you are happy and proud if they finally win.
:-D

Cheers

<edit>
BTW

I may have crashed a handful of them this mornig due to a falsly configured stress test.
Shame.
I'm sorry.
But since then it's OK.
</edit>
ID: 33855 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33856 - Posted: 14 Jan 2018, 18:35:59 UTC - in response to Message 33855.  

Well, definitely no jobs being queued from the new batch. Time to relax and enjoy the Sunday night TV. Now I've got 40 Mbps download, that's a lot less stressful!
ID: 33856 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33873 - Posted: 15 Jan 2018, 14:29:47 UTC

Good news! We have CMS jobs again. It was a "perfect storm" combination of server certificates expiring and a database upgrade that went awry, compounded by a national holiday weekend in the USA.
ID: 33873 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 33886 - Posted: 17 Jan 2018, 9:00:55 UTC

We just ran out of queued jobs for an hour. I was in the process of writing an e-mail to the CERN crew when we started getting jobs again. No idea yet what the problem was/is.
ID: 33886 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34674 - Posted: 15 Mar 2018, 16:35:49 UTC

We've run out of tasks again. e-mails sent.
ID: 34674 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,316,312
RAC: 4,248
Message 34680 - Posted: 16 Mar 2018, 7:41:47 UTC - in response to Message 34674.  

We've run out of tasks again. e-mails sent.
good morning, Ivan - any idea when new CMS tasks will be available?
ID: 34680 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34682 - Posted: 16 Mar 2018, 10:18:56 UTC - in response to Message 34680.  

We've run out of tasks again. e-mails sent.
good morning, Ivan - any idea when new CMS tasks will be available?

Hi Erich. There are tasks again now.
ID: 34682 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34714 - Posted: 22 Mar 2018, 14:04:51 UTC

The job queue (not the task queue) has dried up. I'm trying to work out why. You should set No New Tasks to avoid "compute error" failures.
ID: 34714 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34715 - Posted: 22 Mar 2018, 15:22:38 UTC - in response to Message 34714.  
Last modified: 22 Mar 2018, 15:22:55 UTC

Jobs are available again. There was an upgrade to the development server we use; I guess they forgot to set up any special-case changes for us.
ID: 34715 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 960
Credit: 40,616,548
RAC: 4,836
Message 34716 - Posted: 22 Mar 2018, 20:50:59 UTC

Still running good over here Ivan
Volunteer Mad Scientist For Life
ID: 34716 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34717 - Posted: 22 Mar 2018, 23:35:17 UTC - in response to Message 34716.  
Last modified: 23 Mar 2018, 11:50:56 UTC

Still running good over here Ivan

Yeah, I'm not sure exactly what the problem was -- there were several "inteventions" at CERN today, both with the network and the pre-production WMAgent that we use. Bottom line was that Alan restarted our particular WMAgent instance and all seems well for now...
ID: 34717 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,316,312
RAC: 4,248
Message 34718 - Posted: 23 Mar 2018, 5:52:09 UTC - in response to Message 34717.  

... our particular WMAgent ...
yeah, our famous WMAgent :-)))
ID: 34718 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 34721 - Posted: 24 Mar 2018, 18:13:55 UTC - in response to Message 34718.  
Last modified: 24 Mar 2018, 18:17:37 UTC

... our particular WMAgent ...
yeah, our famous WMAgent :-)))

Tja, Erich, I know it's your particular bugaboo. But I have to live with the fact that we are heavily dependent on CERN IT services, some of which don't know about the dependency! I subscribe to several mailing lists that give me forewarning of disruptions, but there are still interventions that I don't have advance knowledge about.
ID: 34721 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,316,312
RAC: 4,248
Message 35081 - Posted: 24 Apr 2018, 10:19:39 UTC

Ivan, since short time ago, all my CMS tasks fail after several minutes with the

207 (0x000000CF) EXIT_NO_SUB_TASKS :-(

error :-(
ID: 35081 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1504
Credit: 82,979,908
RAC: 79,812
Message 35082 - Posted: 24 Apr 2018, 10:57:12 UTC - in response to Message 35081.  

ID: 35082 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 696
Credit: 5,582,598
RAC: 2,619
Message 35083 - Posted: 24 Apr 2018, 10:57:56 UTC - in response to Message 35081.  

Yes, see my news item from yesterday (it was supposed to go out as a BOINC notice but maybe I forgot to tick the box...)
There are still a few jobs in the queue but at this stage they are probably mainly, if not all, post-Production jobs that run at CERN and not on volunteer machines.
We're having a big CMS@Home meeting this afternoon -- I'm just preparing my presentation now -- where I hope there will be enough experts present to help explain our current and ongoing difficulties.
ID: 35083 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1291
Credit: 23,316,312
RAC: 4,248
Message 35084 - Posted: 24 Apr 2018, 11:17:27 UTC - in response to Message 35083.  

Ivan, I wish you (and us) good luck for the meeting !!!
ID: 35084 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : Jobs drying up


©2020 CERN