Message boards : Theory Application : Condor connection problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,151,503
RAC: 15,790
Message 28423 - Posted: 10 Jan 2017, 15:18:05 UTC

I had one task (https://lhcathome.cern.ch/lhcathome/result.php?resultid=111113981) failed at my home desktop with a condor connection problem. It failed in 1:18 and not the usual 11 minutes as it normally does. The error is also a new one for me: -152 (0xFFFFFF68) ERR_NETOPEN. Has something changed to speedup the handling of connection errors?
ID: 28423 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 28425 - Posted: 10 Jan 2017, 15:59:58 UTC - in response to Message 28423.  

This is how the task should behave when there is no access due to a required port being closed.

The heartbeat mechanism, which is there to detect VMs that failed to boot, fails tasks after about 10mins. In this case port 3125 is open so CVMFS is happy and the VM will boot but port 9618 is closed so it is not possible to contact the Condor server.
ID: 28425 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,151,503
RAC: 15,790
Message 28427 - Posted: 10 Jan 2017, 18:40:39 UTC - in response to Message 28425.  

Please note that this incident is not with my laptop host which can not connect from my office network but my home computer which has been working all the time. There are no ports closed from my home network to the outside world. That's why the laptop propably worked when it was in my home network.
ID: 28427 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,501,728
RAC: 4,157
Message 28430 - Posted: 11 Jan 2017, 2:44:36 UTC
Last modified: 11 Jan 2017, 2:46:46 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=111134738

Well after the 3 hour d/l of this new .vdi and trying the first task I get the *ACCESSDENIED* with the new 32bit version that is trying to move over here.

And this is on the #1 computer over at vLHC doing this exact same task version as over there at vLHC *Theory Simulations 262.60 32bit

Which of course means it is at the Cern server end not the computer.
Volunteer Mad Scientist For Life
ID: 28430 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 28435 - Posted: 11 Jan 2017, 10:38:17 UTC - in response to Message 28430.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=111134738

Well after the 3 hour d/l of this new .vdi and trying the first task I get the *ACCESSDENIED* with the new 32bit version that is trying to move over here.

And this is on the #1 computer over at vLHC doing this exact same task version as over there at vLHC *Theory Simulations 262.60 32bit

Which of course means it is at the Cern server end not the computer.


Very strange. This is the essentially the same image that is running in vLHC@home. Am investigating now.
ID: 28435 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,100,795
RAC: 103,685
Message 28436 - Posted: 11 Jan 2017, 11:54:20 UTC
Last modified: 11 Jan 2017, 11:56:24 UTC

Have made a test with XP(x86)pro as a VM and saw this message as Hypervisor-message:

00:46:02.092291 WARNING [COM]: aRC=E_FAIL (0x80004005) aIID={afca788c-4477-787d-60b2-3fa70e56fbbc} aComponent={HostWrap} aText={Could not load the Host USB Proxy Service (VERR_FILE_NOT_FOUND). The service might not be installed on the host computer}, preserve=true aResultDetail=0

https://lhcathome.cern.ch/lhcathome/result.php?resultid=111274345
ID: 28436 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 28437 - Posted: 11 Jan 2017, 12:49:26 UTC - in response to Message 28435.  
Last modified: 11 Jan 2017, 12:57:04 UTC

I tested the image directly with VirtualBox using the approach described in the other thread. A new 32bit Linux VM was created using the Theory32_2017_01_10.vdi image. The shared directory was added containing an init_data.xml file harvested from one of the slot directories. The VM started and is running fine with the heartbeat file being created.

What does the console show for the VMs that are failing?
ID: 28437 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,100,795
RAC: 103,685
Message 28438 - Posted: 11 Jan 2017, 13:44:45 UTC
Last modified: 11 Jan 2017, 13:47:01 UTC

Host:10408749

VM-ID: 4a938f8b-2482-4f75-a1ae-b1c5f3f756bb

YES it works as configuration from the weekend. but as 32-bit Scintific-Linux:

Hmmm, why not in XPpro....

EDIT: Virtualbox 5.0.30 (to old?)
ID: 28438 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 28439 - Posted: 11 Jan 2017, 13:51:30 UTC - in response to Message 28438.  

It is a little puzzling as the application is the same as what is found in vLHC@home. A small modification was made to the image to support the different userids but that is all.
ID: 28439 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,100,795
RAC: 103,685
Message 28440 - Posted: 11 Jan 2017, 13:52:47 UTC
Last modified: 11 Jan 2017, 13:54:30 UTC

00:01:23.533443 VMMDev: Guest Log: [INFO] Reading volunteer information
00:01:23.847952 VMMDev: Guest Log: [INFO] Volunteer: maeax (75468) Host: 10408749
00:01:23.897518 VMMDev: Guest Log: [INFO] VMID: 4a938f8b-2482-4f75-a1ae-b1c5f3f756bb
00:01:24.309934 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home
00:01:24.558761 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev
00:01:25.100221 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:01:26.205864 VMMDev: Guest Log: [INFO] Theory application starting. Check log files.
00:01:26.786247 VMMDev: Guest Log: [DEBUG] HTCondor ping
00:01:27.572807 VMMDev: Guest Log: [DEBUG] 0
00:02:15.292803 VMMDev: Guest Log: [INFO] New Job Starting in slot1
00:02:15.386891 VMMDev: Guest Log: [INFO] Condor JobID: 1007020.0 in slot1
00:02:20.569212 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764235 in slot1

EDIT: vboxwrapper 26.197!
ID: 28440 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,100,795
RAC: 103,685
Message 28450 - Posted: 12 Jan 2017, 8:06:17 UTC

This Test finished correct.

00:02:15.292803 VMMDev: Guest Log: [INFO] New Job Starting in slot1
00:02:15.386891 VMMDev: Guest Log: [INFO] Condor JobID: 1007020.0 in slot1
00:02:20.569212 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764235 in slot1
00:32:12.555654 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
00:32:17.447561 VMMDev: Guest Log: [INFO] New Job Starting in slot1
00:32:17.523384 VMMDev: Guest Log: [INFO] Condor JobID: 1007520.0 in slot1
00:32:22.636481 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764810 in slot1
02:11:52.276934 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
02:11:54.875099 VMMDev: Guest Log: [INFO] New Job Starting in slot1
02:11:54.994929 VMMDev: Guest Log: [INFO] Condor JobID: 1009113.0 in slot1
02:12:00.130149 VMMDev: Guest Log: [INFO] MCPlots JobID: 34766331 in slot1
04:36:59.627987 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
04:37:02.044171 VMMDev: Guest Log: [INFO] New Job Starting in slot1
04:37:02.105943 VMMDev: Guest Log: [INFO] Condor JobID: 1011562.0 in slot1
04:37:07.237609 VMMDev: Guest Log: [INFO] MCPlots JobID: 34768635 in slot1
06:36:02.905626 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
06:36:05.588459 VMMDev: Guest Log: [INFO] New Job Starting in slot1
06:36:05.659675 VMMDev: Guest Log: [INFO] Condor JobID: 1013841.0 in slot1
06:36:10.818245 VMMDev: Guest Log: [INFO] MCPlots JobID: 34771198 in slot1
08:07:35.735357 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
08:07:38.740494 VMMDev: Guest Log: [INFO] New Job Starting in slot1
08:07:38.822363 VMMDev: Guest Log: [INFO] Condor JobID: 1015805.0 in slot1
08:07:43.933665 VMMDev: Guest Log: [INFO] MCPlots JobID: 34772967 in slot1
08:43:24.756280 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
08:43:27.198427 VMMDev: Guest Log: [INFO] New Job Starting in slot1
08:43:27.265357 VMMDev: Guest Log: [INFO] Condor JobID: 1016513.0 in slot1
08:43:32.370354 VMMDev: Guest Log: [INFO] MCPlots JobID: 34773837 in slot1
08:48:47.090949 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
08:48:49.545579 VMMDev: Guest Log: [INFO] New Job Starting in slot1
08:48:49.615464 VMMDev: Guest Log: [INFO] Condor JobID: 1016614.0 in slot1
08:48:54.740128 VMMDev: Guest Log: [INFO] MCPlots JobID: 34773790 in slot1
09:16:48.895998 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
09:16:51.281538 VMMDev: Guest Log: [INFO] New Job Starting in slot1
09:16:51.355729 VMMDev: Guest Log: [INFO] Condor JobID: 1017081.0 in slot1
09:16:56.464874 VMMDev: Guest Log: [INFO] MCPlots JobID: 34774237 in slot1
12:39:19.480341 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0.
12:48:51.028361 VMMDev: Guest Log: [INFO] Condor exited with return value N/A.
12:48:51.066879 VMMDev: Guest Log: [INFO] Shutting Down.
ID: 28450 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,100,795
RAC: 103,685
Message 28461 - Posted: 12 Jan 2017, 16:23:13 UTC

XPpro(x86) as VM with this message:

udev is waiting more than 5 min. one message is:

udev still not settled... continued in background...

After 11 min. task finished.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=111707332
ID: 28461 · Report as offensive     Reply Quote

Message boards : Theory Application : Condor connection problem


©2024 CERN