Message boards :
Theory Application :
Condor connection problem
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903 |
I had one task (https://lhcathome.cern.ch/lhcathome/result.php?resultid=111113981) failed at my home desktop with a condor connection problem. It failed in 1:18 and not the usual 11 minutes as it normally does. The error is also a new one for me: -152 (0xFFFFFF68) ERR_NETOPEN. Has something changed to speedup the handling of connection errors? |
Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0 |
This is how the task should behave when there is no access due to a required port being closed. The heartbeat mechanism, which is there to detect VMs that failed to boot, fails tasks after about 10mins. In this case port 3125 is open so CVMFS is happy and the VM will boot but port 9618 is closed so it is not possible to contact the Condor server. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903 |
Please note that this incident is not with my laptop host which can not connect from my office network but my home computer which has been working all the time. There are no ports closed from my home network to the outside world. That's why the laptop propably worked when it was in my home network. |
Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,751,535 RAC: 8,764 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=111134738 Well after the 3 hour d/l of this new .vdi and trying the first task I get the *ACCESSDENIED* with the new 32bit version that is trying to move over here. And this is on the #1 computer over at vLHC doing this exact same task version as over there at vLHC *Theory Simulations 262.60 32bit Which of course means it is at the Cern server end not the computer. Volunteer Mad Scientist For Life |
Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=111134738 Very strange. This is the essentially the same image that is running in vLHC@home. Am investigating now. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
Have made a test with XP(x86)pro as a VM and saw this message as Hypervisor-message: 00:46:02.092291 WARNING [COM]: aRC=E_FAIL (0x80004005) aIID={afca788c-4477-787d-60b2-3fa70e56fbbc} aComponent={HostWrap} aText={Could not load the Host USB Proxy Service (VERR_FILE_NOT_FOUND). The service might not be installed on the host computer}, preserve=true aResultDetail=0 https://lhcathome.cern.ch/lhcathome/result.php?resultid=111274345 |
Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0 |
I tested the image directly with VirtualBox using the approach described in the other thread. A new 32bit Linux VM was created using the Theory32_2017_01_10.vdi image. The shared directory was added containing an init_data.xml file harvested from one of the slot directories. The VM started and is running fine with the heartbeat file being created. What does the console show for the VMs that are failing? |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
Host:10408749 VM-ID: 4a938f8b-2482-4f75-a1ae-b1c5f3f756bb YES it works as configuration from the weekend. but as 32-bit Scintific-Linux: Hmmm, why not in XPpro.... EDIT: Virtualbox 5.0.30 (to old?) |
Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0 |
It is a little puzzling as the application is the same as what is found in vLHC@home. A small modification was made to the image to support the different userids but that is all. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
00:01:23.533443 VMMDev: Guest Log: [INFO] Reading volunteer information 00:01:23.847952 VMMDev: Guest Log: [INFO] Volunteer: maeax (75468) Host: 10408749 00:01:23.897518 VMMDev: Guest Log: [INFO] VMID: 4a938f8b-2482-4f75-a1ae-b1c5f3f756bb 00:01:24.309934 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home 00:01:24.558761 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 00:01:25.100221 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home 00:01:26.205864 VMMDev: Guest Log: [INFO] Theory application starting. Check log files. 00:01:26.786247 VMMDev: Guest Log: [DEBUG] HTCondor ping 00:01:27.572807 VMMDev: Guest Log: [DEBUG] 0 00:02:15.292803 VMMDev: Guest Log: [INFO] New Job Starting in slot1 00:02:15.386891 VMMDev: Guest Log: [INFO] Condor JobID: 1007020.0 in slot1 00:02:20.569212 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764235 in slot1 EDIT: vboxwrapper 26.197! |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
This Test finished correct. 00:02:15.292803 VMMDev: Guest Log: [INFO] New Job Starting in slot1 00:02:15.386891 VMMDev: Guest Log: [INFO] Condor JobID: 1007020.0 in slot1 00:02:20.569212 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764235 in slot1 00:32:12.555654 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 00:32:17.447561 VMMDev: Guest Log: [INFO] New Job Starting in slot1 00:32:17.523384 VMMDev: Guest Log: [INFO] Condor JobID: 1007520.0 in slot1 00:32:22.636481 VMMDev: Guest Log: [INFO] MCPlots JobID: 34764810 in slot1 02:11:52.276934 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 02:11:54.875099 VMMDev: Guest Log: [INFO] New Job Starting in slot1 02:11:54.994929 VMMDev: Guest Log: [INFO] Condor JobID: 1009113.0 in slot1 02:12:00.130149 VMMDev: Guest Log: [INFO] MCPlots JobID: 34766331 in slot1 04:36:59.627987 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 04:37:02.044171 VMMDev: Guest Log: [INFO] New Job Starting in slot1 04:37:02.105943 VMMDev: Guest Log: [INFO] Condor JobID: 1011562.0 in slot1 04:37:07.237609 VMMDev: Guest Log: [INFO] MCPlots JobID: 34768635 in slot1 06:36:02.905626 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 06:36:05.588459 VMMDev: Guest Log: [INFO] New Job Starting in slot1 06:36:05.659675 VMMDev: Guest Log: [INFO] Condor JobID: 1013841.0 in slot1 06:36:10.818245 VMMDev: Guest Log: [INFO] MCPlots JobID: 34771198 in slot1 08:07:35.735357 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 08:07:38.740494 VMMDev: Guest Log: [INFO] New Job Starting in slot1 08:07:38.822363 VMMDev: Guest Log: [INFO] Condor JobID: 1015805.0 in slot1 08:07:43.933665 VMMDev: Guest Log: [INFO] MCPlots JobID: 34772967 in slot1 08:43:24.756280 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 08:43:27.198427 VMMDev: Guest Log: [INFO] New Job Starting in slot1 08:43:27.265357 VMMDev: Guest Log: [INFO] Condor JobID: 1016513.0 in slot1 08:43:32.370354 VMMDev: Guest Log: [INFO] MCPlots JobID: 34773837 in slot1 08:48:47.090949 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 08:48:49.545579 VMMDev: Guest Log: [INFO] New Job Starting in slot1 08:48:49.615464 VMMDev: Guest Log: [INFO] Condor JobID: 1016614.0 in slot1 08:48:54.740128 VMMDev: Guest Log: [INFO] MCPlots JobID: 34773790 in slot1 09:16:48.895998 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 09:16:51.281538 VMMDev: Guest Log: [INFO] New Job Starting in slot1 09:16:51.355729 VMMDev: Guest Log: [INFO] Condor JobID: 1017081.0 in slot1 09:16:56.464874 VMMDev: Guest Log: [INFO] MCPlots JobID: 34774237 in slot1 12:39:19.480341 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 12:48:51.028361 VMMDev: Guest Log: [INFO] Condor exited with return value N/A. 12:48:51.066879 VMMDev: Guest Log: [INFO] Shutting Down. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
XPpro(x86) as VM with this message: udev is waiting more than 5 min. one message is: udev still not settled... continued in background... After 11 min. task finished. https://lhcathome.cern.ch/lhcathome/result.php?resultid=111707332 |
©2024 CERN