Message boards : Number crunching : Condor Problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 28276 - Posted: 28 Dec 2016, 14:37:00 UTC
Last modified: 28 Dec 2016, 14:44:47 UTC

Sorry if starting a new thread is redundant, but I thought perhaps this discussion should be in a more general location...

Seeing a lot of these errors right now (for me, mostly Theory but also LHCb):

2016-12-28 09:09:18 (19627): VM Completion Message: Could not connect to Condor server on port 9618


A recent example is Task 110836548.
ID: 28276 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1169
Credit: 54,226,370
RAC: 56,762
Message 28278 - Posted: 28 Dec 2016, 14:55:01 UTC - in response to Message 28276.  

Yes it is doing the same thing over at vLHC

(copy of my post there seconds ago)

OK what is going on here right now??

https://lhcathome.cern.ch/vLHCathome/result.php?resultid=7030843

Just got 2 of those in a row and they act like they are connecting fast and then stop all of a sudden because the Condor is flying around dropping bread on things again.

Maybe I will try one at the LHC and see if it is doing this too.

Edit: no luck and is doing the same thing at LHC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=110832690
Volunteer Mad Scientist For Life
ID: 28278 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28279 - Posted: 28 Dec 2016, 16:00:18 UTC - in response to Message 28278.  



OK what is going on here right now??



Blocked by external firewall. Hopefully fixed but will not know until the configuration has been refreshed (up to 4 hours).

Thanks for the alert
ID: 28279 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 28280 - Posted: 28 Dec 2016, 16:07:23 UTC
Last modified: 28 Dec 2016, 16:08:24 UTC

Hopefully fixed but will not know until the configuration has been refreshed (up to 4 hours)...


Thanks, Laurence.

I have a Theory and an LHCb task that are 11 and 15 hours into their work, respectively, and am wondering if I should suspend them for now. Is it possible that work will be lost if those tasks try to get another job or finish up while connectivity issues continue?
ID: 28280 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1169
Credit: 54,226,370
RAC: 56,762
Message 28281 - Posted: 28 Dec 2016, 16:14:54 UTC - in response to Message 28280.  
Last modified: 28 Dec 2016, 16:21:14 UTC

The ones running should be ok......the worst that usually happens is you have to wait to send them in but that should not happen either.

I have lots of them running and will leave them running and will check to see if this Condor problem is fixed (makes no sense that it just starts happening when things are running good lately)

I see it is also happening over at vLHC-dev
ID: 28281 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 28282 - Posted: 28 Dec 2016, 16:32:28 UTC - in response to Message 28281.  
Last modified: 28 Dec 2016, 16:32:38 UTC

The ones running should be ok...

Okay, great. I'll let them go, then.
ID: 28282 · Report as offensive     Reply Quote
Profile Ben Segal
Volunteer moderator
Project administrator

Send message
Joined: 1 Sep 04
Posts: 140
Credit: 2,579
RAC: 0
Message 28283 - Posted: 28 Dec 2016, 17:04:01 UTC - in response to Message 28279.  



OK what is going on here right now??



Blocked by external firewall. Hopefully fixed but will not know until the configuration has been refreshed (up to 4 hours).

Thanks for the alert

Well done Laurence! Condor seems happy again.

Happy holidays to all!

Ben
ID: 28283 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1169
Credit: 54,226,370
RAC: 56,762
Message 28284 - Posted: 28 Dec 2016, 17:05:35 UTC - in response to Message 28282.  
Last modified: 28 Dec 2016, 17:14:59 UTC

The ones running should be ok...

Okay, great. I'll let them go, then.



I found out for sure and have sent in a few finished tasks here and vLHC

Just started a new task and we are back to normal again so time to start up a batch of new tasks here again.
Volunteer Mad Scientist For Life
ID: 28284 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 93
Credit: 5,160,246
RAC: 0
Message 28285 - Posted: 28 Dec 2016, 17:25:41 UTC - in response to Message 28284.  

MAGIC Quantum Mechanic wrote:
I found out for sure and have sent in a few finished tasks here and vLHC

Just started a new task and we are back to normal again so time to start up a batch of new tasks here again.

Ye, indeed!

Ben Segal wrote:
Well done Laurence! Condor seems happy again.

+1 and thanks for the quick response.
ID: 28285 · Report as offensive     Reply Quote

Message boards : Number crunching : Condor Problems


©2024 CERN