Message boards : CMS Application : CMS Tasks Failing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33129 - Posted: 24 Nov 2017, 9:19:10 UTC - in response to Message 33128.  

thank you, Ivan, for the information.
It's always valuable :-)
ID: 33129 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33130 - Posted: 24 Nov 2017, 14:56:12 UTC - in response to Message 33129.  

Well, I know what it's like to be a mushroom: Kept in the dark and fed on horsesh*t!
ID: 33130 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33141 - Posted: 27 Nov 2017, 14:43:26 UTC - in response to Message 33128.  

There is another intervention going on at the moment. It doesn't appear to be affecting us -- yet...
ID: 33141 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33143 - Posted: 27 Nov 2017, 17:40:53 UTC - in response to Message 33141.  

There is another intervention going on at the moment. It doesn't appear to be affecting us -- yet...

They are still trying to do a database transfer. Our queue is starting to drain, but luckily I had it doubled to 2,000 last night, so we have 2-3
hours before we run out. Best set your machines to No New Tasks until we see how this plays out.
ID: 33143 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33145 - Posted: 27 Nov 2017, 19:35:35 UTC - in response to Message 33143.  

The queue is now empty, so no more jobs until further notice.
ID: 33145 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33149 - Posted: 28 Nov 2017, 9:13:31 UTC - in response to Message 33145.  

The queue is now empty, so no more jobs until further notice.

OK, there are jobs again. There are still some difficulties at CERN but for now we have jobs to run.
ID: 33149 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33510 - Posted: 26 Dec 2017, 11:42:41 UTC

since this morning, all my CMS tasks fail after about 12-18 minutes, with final state:
207 (0x000000CF) EXIT_NO_SUB_TASKS

which probably means that there are no jobs available :-(
ID: 33510 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33513 - Posted: 26 Dec 2017, 12:32:36 UTC
Last modified: 26 Dec 2017, 12:36:26 UTC

Oops, I overslept. More jobs should be in the queue soon. (Bad news yesterday, one of my fellow expeditioners from Mawson Base in 1980 had a stroke and died at the weekend; apparently his daughter married Australia's current wicket-keeper which explains why the Aussie team was wearing black arm-bands at the start of the Boxing Day test in Melbourne today).
[Edit] Oh, [naughty word!], the WMAgent appears to have gone down too. The only one on my contact list who isn't on holidays is in Chicago, so it might be a couple of hours yet before he responds. [/Edit]
ID: 33513 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33514 - Posted: 26 Dec 2017, 12:44:22 UTC - in response to Message 33513.  

Bad news yesterday, one of my fellow expeditioners from Mawson Base in 1980 had a stroke and died at the weekend
Ivan, I am sorry for this :-(
ID: 33514 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33522 - Posted: 26 Dec 2017, 15:16:29 UTC - in response to Message 33514.  

Thanks Erich. I may yet decide to take a quick trip to Hobart for his funeral, which will not be before the New Year, as I understand it.
ID: 33522 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33524 - Posted: 26 Dec 2017, 17:29:22 UTC - in response to Message 33513.  
Last modified: 26 Dec 2017, 17:29:41 UTC

... the WMAgent appears to have gone down too. The only one on my contact list who isn't on holidays is in Chicago, so it might be a couple of hours yet before he responds.
Ivan, any idea when CMS will be working again?
ID: 33524 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33525 - Posted: 26 Dec 2017, 18:44:50 UTC - in response to Message 33524.  

No, sorry, no response from anyone on holidays. Please set No New Tasks or switch to backup projects, Of course, I'll let you know when there are jobs again, but I'm just about to go to sleep for an extended period... See you in 15 or 18 hours! (I have to refill a cryomagnet's outer heat-shield with liquid nitrogen tomorrow...).
ID: 33525 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1176
Credit: 54,887,670
RAC: 5,761
Message 33526 - Posted: 26 Dec 2017, 23:43:27 UTC - in response to Message 33525.  

No, sorry, no response from anyone on holidays. Please set No New Tasks or switch to backup projects, Of course, I'll let you know when there are jobs again, but I'm just about to go to sleep for an extended period... See you in 15 or 18 hours! (I have to refill a cryomagnet's outer heat-shield with liquid nitrogen tomorrow...).


https://tinyurl.com/snow-is-cold-enough-for-me

ID: 33526 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33554 - Posted: 28 Dec 2017, 13:00:38 UTC - in response to Message 33526.  

No, sorry, no response from anyone on holidays. Please set No New Tasks or switch to backup projects, Of course, I'll let you know when there are jobs again, but I'm just about to go to sleep for an extended period... See you in 15 or 18 hours! (I have to refill a cryomagnet's outer heat-shield with liquid nitrogen tomorrow...).


https://tinyurl.com/snow-is-cold-enough-for-me


Tja, our magnet's just at LHe boiling point, but it runs at 4 Tesla. It runs in persistent mode, which means we haven't had to apply any current in all my 15+ years at the lab. We have to fill with LHe twice a year, and top up the LN2 outer chamber every week.
Sorry I forgot to mention yesterday that we have jobs again, but I guess most people would have noticed.
ID: 33554 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33604 - Posted: 31 Dec 2017, 16:43:30 UTC

After CMS tasks have run flawless for a while, some of them started to error out again after short time, stderr telling the following:

2017-12-31 14:03:23 (12584): Guest Log: [ERROR] Could not connect to Condor server on port 9618

This type of problem has been around many times in the past. Anyone any idea why?
ID: 33604 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33605 - Posted: 31 Dec 2017, 17:03:40 UTC - in response to Message 33604.  

No, I don't know why, but I have seen it myself on my University servers so I guess there is some general network problem.
I'll try to look into it next week; don't feel shy about reminding me if you don't hear anything from me!
ID: 33605 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33606 - Posted: 31 Dec 2017, 17:51:06 UTC - in response to Message 33605.  

I'll try to look into it next week; don't feel shy about reminding me if you don't hear anything from me!
Thanks, Ivan, for your help :-)
Happy New Year!
ID: 33606 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 33610 - Posted: 1 Jan 2018, 7:01:50 UTC

good morning everybody, and a Happy New Year!

Unfortunately, the year starts with another CMS problem: either there are no jobs available, or the WMAgent is down again :-(
ID: 33610 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33612 - Posted: 1 Jan 2018, 11:22:32 UTC - in response to Message 33610.  

good morning everybody, and a Happy New Year!

Unfortunately, the year starts with another CMS problem: either there are no jobs available, or the WMAgent is down again :-(

Yes, the agent has died. I'll see if I can raise someone at CERN or FNAL.
ID: 33612 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 33613 - Posted: 1 Jan 2018, 13:39:01 UTC - in response to Message 33612.  

The agent appears to have just been restarted, I expect jobs in the queue soon.
ID: 33613 · Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next

Message boards : CMS Application : CMS Tasks Failing


©2024 CERN