Message boards : Theory Application : Problem of the day
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 32376 - Posted: 10 Sep 2017, 3:57:39 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155279014

I got a couple of these tonight and others that do start up running.

So is the Condor asleep again?
The run time was only 13 min 58 sec so it wasn't past the 20 minute wall.

(this is not a internet speed problem on my end)
Volunteer Mad Scientist For Life
ID: 32376 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 182
Credit: 204,121
RAC: 0
Message 32396 - Posted: 12 Sep 2017, 13:14:31 UTC - in response to Message 32376.  

Yes, it could have been the server was temporarily unavailable.
ID: 32396 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 32696 - Posted: 8 Oct 2017, 15:49:01 UTC
Last modified: 8 Oct 2017, 15:49:44 UTC

Last few days I have had this happening once in a while in 4 seconds when they are suspended.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=158796702

It will have a task running as this happens and starts tasks after this.

New versions of Boinc and VB
ID: 32696 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 32769 - Posted: 10 Oct 2017, 22:19:46 UTC

https://lhcathome.cern.ch/lhcathome/results.php?userid=5472

Too many of these errors lately and there would be even more if I didn't have most of the new ones suspended.
ID: 32769 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 465
Credit: 3,327,099
RAC: 1,438
Message 32790 - Posted: 11 Oct 2017, 13:57:44 UTC

No or not enough Theory jobs available.

Some BOINC-tasks fail after about 13 minutes run time: EXIT_NO_SUB_TASKS
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871945

or when you're lucky and get a first task, a big chance you don't get a second and the task ends early.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871885
ID: 32790 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 32791 - Posted: 11 Oct 2017, 14:49:23 UTC

I am having better luck with the -dev multi-core Theory tasks

I would switch to the Atlas tasks here except for those huge vdi's that take half a day to d/l on each host.

Rather get some SixTracks here now (and let me know we have those before they are all loaded on computers that don't do these VB tasks at all)
ID: 32791 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 465
Credit: 3,327,099
RAC: 1,438
Message 32796 - Posted: 11 Oct 2017, 15:48:15 UTC - in response to Message 32790.  

No or not enough Theory jobs available.

It looks like a temporary hickup - Normal job delivering now.
ID: 32796 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 465
Credit: 3,327,099
RAC: 1,438
Message 32797 - Posted: 11 Oct 2017, 16:10:09 UTC - in response to Message 32796.  

No or not enough Theory jobs available.

It looks like a temporary hickup - Normal job delivering now.

Sorry - False hope.

The theory VM's are killed one after the other due to lack of sub jobs.
ID: 32797 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 33213 - Posted: 6 Dec 2017, 0:31:40 UTC
Last modified: 6 Dec 2017, 0:32:52 UTC

(just thought I would bring this up since staring at 9 computers and trying to get 50 tasks running with VB can make a person......)

https://lhcathome.cern.ch/lhcathome/result.php?resultid=168740585

OK I just happened to go back to this pc to make sure the tasks were running.

The LHC-dev Theory was running with no problem but the Theory task had the good old "Could not connect to vccs1.cern.ch on port 443" after doing the *Testing VCCS connection to vccs1.cern.ch on port 443*

With the *nc: getaddrinfo: Name or service not known* between those two.

It did have no problem with Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125
Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded!
: nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress


But as usual the internet speed is not fast enough to do the simple connection with the Cern server and get the credentials so it can then get to HTCondor ping.........so the second try is just a 25 minute computer error and it is a good thing I always try to set the Boinc Manager at *no new tasks* so it does't keep doing that until it gets lucky and it starts (no a squid will not help me get credentials)

So along with the ones that do that *VM Completion Message: Condor exited after 774s without running a job.* after 36 minutes running......well it is always a good thing to NOT let them auto-run with BM set to get new tasks all the time.

(of course if it was just running SixTrack tasks non-stop there would not be this problem.....but....)
Volunteer Mad Scientist For Life
ID: 33213 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 631
Credit: 17,842,866
RAC: 17,100
Message 33218 - Posted: 6 Dec 2017, 17:03:14 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=168746258

I just saw this one happen and wondered why it was all of a sudden finished after almost 7 hours.....so I check here on the stderr and once again the VM Heartbeat file specified, but missing heartbeat

Of course I am just pointing this out and know there is no actual reason or fix and I have been running these VB tasks daily 24/7 longer than anyone.

(and probably spend too many hours with a monitor in front of me)

.....just talking to myself.....no big deal and nothing new
Volunteer Mad Scientist For Life
ID: 33218 · Report as offensive     Reply Quote

Message boards : Theory Application : Problem of the day


©2018 CERN