Message boards : Number crunching : Missing heartbeat file errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 56,545
Message 28184 - Posted: 21 Dec 2016, 17:04:47 UTC - in response to Message 28183.  

An example that gets a file from CERN:

telnet lhchomeproxy.cern.ch 3125
Trying 128.142.168.203...
Connected to lhchomeproxy.cern.ch.
Escape character is '^]'.
GET http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/.cvmfspublished HTTP/1.0
Host: cvmfs-stratum-one.cern.ch

HTTP/1.1 200 OK
Date: Wed, 21 Dec 2016 16:43:59 GMT
Accept-Ranges: bytes
Content-Length: 515
Content-Type: application/x-cvmfs
Server: Apache/2.4.6 (CentOS) mod_wsgi/3.4 Python/2.7.5
Expires: Wed, 21 Dec 2016 16:46:07 GMT
Cache-Control: max-age=120
X-Cache: MISS from front08.cern.ch
X-Cache-Lookup: HIT from front08.cern.ch:80
Age: 24
X-Cache: HIT from vocms0323.cern.ch/3
Via: 1.1 front08.cern.ch (squid/3.5.20), 1.1 vocms0323.cern.ch/3 (squid/frontier-squid-3.5.22-2.1)
Connection: close

Followed by the contents of .cvmfspublished
Connection closed by foreign host.


Now /cvmfs/grid.cern.ch/vc/sbin/bootstrap
telnet lhchomeproxy.cern.ch 3125
Trying 128.142.168.203...
Connected to lhchomeproxy.cern.ch.
Escape character is '^]'.
GET http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/vc/sbin/bootstrap HTTP/1.0
Host: cvmfs-stratum-one.cern.ch

HTTP/1.1 404 Not Found
Date: Wed, 21 Dec 2016 16:47:47 GMT
Server: Apache/2.4.6 (CentOS) mod_wsgi/3.4 Python/2.7.5
Content-Length: 234
Content-Type: text/html; charset=iso-8859-1
X-Cache: MISS from front15.cern.ch
X-Cache-Lookup: MISS from front15.cern.ch:80
X-Cache: MISS from vocms0323.cern.ch/3
Via: 1.1 front15.cern.ch (squid/3.5.20), 1.1 vocms0323.cern.ch/3 (squid/frontier-squid-3.5.22-2.1)
Connection: close

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /cvmfs/grid.cern.ch/vc/sbin/bootstrap was not found on this server.</p>
</body></html>
Connection closed by foreign host.

Either an incomplete/wrong URL or the file is not where it should be.
ID: 28184 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28185 - Posted: 21 Dec 2016, 17:32:09 UTC - in response to Message 28183.  

I typed "ping grid.cern.ch" into PowerShell. The IP address it found was 198.105.244.130, and all pings were lost.

Typing "nslookup grid.cern.ch" returns the following text:
Server: dsldevice.attlocal.net
Address: 192.168.1.254

Non-authoritative answer:
Name: grid.cern.ch
Addresses: 198.105.244.130
198.105.254.130


I was able to connect to http://lhchomeproxy.cern.ch:3125/ using my browser and telnet to lhchomeproxy.cern.ch port 3125 as well.

Typing nslookup lhchomeproxy.cern.ch results in the following text:
Server: dsldevice.attlocal.net
Address: 192.168.1.254

Non-authoritative answer:
Name: cmsextproxy.cern.ch
Addresses: 128.142.168.203
128.142.168.202
Aliases: lhchomeproxy.cern.ch


I tried to investigate to see if there are any DNS problems. I got a report at http://www.dnsstuff.com/tools#dnsReport|type=domain&&value=lhchomeproxy.cern.ch that had an interesting warning that could be causing DNS to be slower than needed. The warning is on a test named "NS matches parent list". I wonder if the delay caused by this potential problem might be causing our error.

I found out where the bad DNS response came from for grid.cern.ch: AT&T uses DNS hijacking for its DNS Error Assist service to try to help web browsers go to the correct web page when they mistype a web address, but this breaks standard behavior that relies on having a proper DNS error when something is wrong. I am shutting that down on our family's account.
ID: 28185 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 56,545
Message 28186 - Posted: 21 Dec 2016, 18:37:11 UTC - in response to Message 28185.  

There is no DNS entry for grid.cern.ch, 198.105.244.130 or 198.105.244.130.
At least from outside the CERN network.
Therefore a ping will not work.


dig @8.8.8.8 grid.cern.ch.

; <<>> DiG 9.9.9-P1 <<>> @8.8.8.8 grid.cern.ch.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 41832
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;grid.cern.ch. IN A

;; AUTHORITY SECTION:
cern.ch. 1799 IN SOA ext-dns-1.cern.ch. external-dns.cern.ch. 2013266176 1200 300 2419200 10800

;; Query time: 84 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Dec 21 18:52:49 CET 2016
;; MSG SIZE rcvd: 100



dig @8.8.8.8 -x 198.105.244.130

; <<>> DiG 9.9.9-P1 <<>> @8.8.8.8 -x 198.105.244.130
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11854
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;130.244.105.198.in-addr.arpa. IN PTR

;; AUTHORITY SECTION:
198.in-addr.arpa. 1785 IN SOA z.arin.net. dns-ops.arin.net. 2016103225 1800 900 691200 10800

;; Query time: 40 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Dec 21 18:52:16 CET 2016
;; MSG SIZE rcvd: 111



dig @8.8.8.8 -x 198.105.244.130

; <<>> DiG 9.9.9-P1 <<>> @8.8.8.8 -x 198.105.254.130
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15703
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;130.254.105.198.in-addr.arpa. IN PTR

;; AUTHORITY SECTION:
198.in-addr.arpa. 1797 IN SOA z.arin.net. dns-ops.arin.net. 2016103226 1800 900 691200 10800

;; Query time: 101 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Dec 21 18:55:27 CET 2016
;; MSG SIZE rcvd: 111



The DNS query for lhchomeproxy.cern.ch is ok as it is an alias for cmsextproxy.cern.ch

dig @8.8.8.8 lhchomeproxy.cern.ch

; <<>> DiG 9.9.9-P1 <<>> @8.8.8.8 lhchomeproxy.cern.ch
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33358
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;lhchomeproxy.cern.ch. IN A

;; ANSWER SECTION:
lhchomeproxy.cern.ch. 10799 IN CNAME cmsextproxy.cern.ch.
cmsextproxy.cern.ch. 59 IN A 128.142.168.203
cmsextproxy.cern.ch. 59 IN A 128.142.168.202

;; Query time: 126 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Dec 21 19:06:20 CET 2016
;; MSG SIZE rcvd: 107



As Ivan wrote the cvmfs files are wrapped in an URL like http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/filename.
Therefore and because of the missing DNS entry you will not be able to retreive a file directly from http://grid.cern.ch/filename.

I would try a project reset to ensure the VM is not corrupt.
ID: 28186 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 489
Message 28187 - Posted: 21 Dec 2016, 18:42:20 UTC - in response to Message 28186.  

There is no DNS entry for grid.cern.ch, 198.105.244.130 or 198.105.244.130.
At least from outside the CERN network.
Therefore a ping will not work.


I used whois on a cygwin system to get the Boulder, CO information.

As Ivan wrote the cvmfs files are wrapped in an URL like http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/filename.
Therefore and because of the missing DNS entry you will not be able to retreive a file directly from http://grid.cern.ch/filename.

I would try a project reset to ensure the VM is not corrupt.

Hey, I'm supposed to say that!
ID: 28187 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 489
Message 28188 - Posted: 21 Dec 2016, 18:49:03 UTC - in response to Message 28184.  

Either an incomplete/wrong URL or the file is not where it should be.


Interesting sleuthing! I know it exists because I can see it on one of my servers that runs cvmfs, and also after logging in to one of my running VMs. This is getting beyond my pay grade, I hope it give Laurence or Nils a hint.
ID: 28188 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 56,545
Message 28189 - Posted: 21 Dec 2016, 19:11:27 UTC - in response to Message 28188.  

I searched through my proxy logs.
That file was not requested by any of my VMs during the last 3 weeks.
If it is necessary it is included in the VM itself or included/packed in another URL, I guess.
ID: 28189 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28190 - Posted: 21 Dec 2016, 20:01:01 UTC - in response to Message 28186.  

I have been repeatedly resetting the project to no avail. I still get errors so I have temporarily detached until I see signs that this has been resolved to avoid damaging the project further.
ID: 28190 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28191 - Posted: 21 Dec 2016, 21:03:29 UTC - in response to Message 28182.  

I just tried to ping grid.cern.ch, and all of the pings failed. Is there a firewall in between my computer and that server dropping pings?

Yes, the external firewall at CERN does not allow ping traffic through. You would need to try to connect directly to the port with something like netcat.

Is grid.cern.ch down? DNS was able to resolve that server's IP address as 198.105.244.130. I am trying to see if there is a network issue between my computer and the server.


grid.cern.ch used as a namespace identifier rather than a hostname. To check CVMFS connectivity you could run the following Linux command.


nc -z -v -w 5 lhchomeproxy.cern.ch 3125 
ID: 28191 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28192 - Posted: 21 Dec 2016, 21:12:41 UTC - in response to Message 28190.  

I have been repeatedly resetting the project to no avail. I still get errors so I have temporarily detached until I see signs that this has been resolved to avoid damaging the project further.


Don't worry, you are not damaging the project. On the contrary, helping to solve issues that may affect others is extremely important work and your lost cycles are worth all the cycles we gain by fixing this for others. Unfortunately BOINC credit doesn't take this into consideration.
ID: 28192 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28193 - Posted: 21 Dec 2016, 21:22:56 UTC - in response to Message 28189.  

I searched through my proxy logs.
That file was not requested by any of my VMs during the last 3 weeks.
If it is necessary it is included in the VM itself or included/packed in another URL, I guess.


Yes, this file is included in the VM and I have checked that it is available in CVMFS. As this issue is not affecting everyone, I am assuming that it is something to do with your local environment (network or machine). If this is a laptop, try in another location.
ID: 28193 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28195 - Posted: 21 Dec 2016, 22:39:38 UTC - in response to Message 28191.  

I am running Windows 10, and since I was able to see the message at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4052&postid=28184 show that HTTP connections are valid, I typed http://lhchomeproxy.cern.ch:3125/ into my browser and it connected and got an HTTP 400 error, showing that I was able to connect to the server.
ID: 28195 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28196 - Posted: 21 Dec 2016, 22:59:18 UTC - in response to Message 28195.  

It might be due to the installation of VirtualBox. Have you tried reinstalling it?
ID: 28196 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28197 - Posted: 21 Dec 2016, 23:47:35 UTC - in response to Message 28196.  

I just upgraded from VirtualBox 5.1.10 to 5.1.12 today to see if that was the issue. I have run several successful tasks last week on 5.1.10. The upgrade did not solve the issue.
ID: 28197 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28198 - Posted: 22 Dec 2016, 0:07:01 UTC - in response to Message 28196.  

I doubt that this is due to the installation of VirtualBox because I do run ATLAS@home, another project whose work units requires network connectivity instead of being properly self-contained like Cosmology@home, and its work units properly process even on my family's new gigabit fiber connection and my computer's gigabit Ethernet connection to that fiber connection.
ID: 28198 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28199 - Posted: 22 Dec 2016, 0:55:11 UTC

I briefly reverted to the Wi-Fi connection, and that did not solve the problem.
ID: 28199 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28200 - Posted: 22 Dec 2016, 1:27:28 UTC - in response to Message 28196.  

I just did a complete uninstall and reinstall of VirtualBox, and that did not solve my problem.
ID: 28200 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 732
Credit: 49,336,437
RAC: 25,953
Message 28205 - Posted: 22 Dec 2016, 10:22:04 UTC

I just installed vbox and extension to a new desktop host (version 5.0.18). The first Theory task failed with the missing heartbeat file error. This host is at the same location as my laptop which also is failing.

My other desktop is in a different location and does not have this problem with Theory tasks (it has an older version of vbox (5.0.12) and Boinc. I'm taking the laptop home for Christmas so I can test it from there.
ID: 28205 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 489
Message 28208 - Posted: 22 Dec 2016, 15:49:49 UTC - in response to Message 28184.  

I got something similar requesting bootstrap:
[eesridr:BOINC] > telnet lhchomeproxy.cern.ch 3125
Trying 128.142.168.202...
Connected to lhchomeproxy.cern.ch.
Escape character is '^]'.
GET http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/vc/sbin/bootstrap HTTP/1.0

HTTP/1.1 400 Bad Request
Server: squid/frontier-squid-3.5.22-2.1
Mime-Version: 1.0
Date: Thu, 22 Dec 2016 15:39:38 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 4151
X-Squid-Error: ERR_INVALID_REQ 0
Vary: Accept-Language
Content-Language: en
X-Cache: MISS from vocms0322.cern.ch/3
Via: 1.1 vocms0322.cern.ch/3 (squid/frontier-squid-3.5.22-2.1)
Connection: close





ERROR: The requested URL could not be retrieved



ERROR


The requested URL could not be retrieved







Invalid Request error was encountered while trying to process the request:




GET http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/vc/sbin/bootstrap HTTP/1.0 



Some possible problems are:



  • Missing or unknown request method.


  • Missing URL.


  • Missing HTTP Identifier (HTTP/1.0).


  • Request is too large.


  • Content-Length missing for POST or PUT requests.


  • Illegal character in hostname; underscores are not allowed.


  • HTTP/1.1 Expect: feature is being asked from an HTTP/1.0 software.




Your cache administrator is squid.












Connection closed by foreign host.

ID: 28208 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 489
Message 28209 - Posted: 22 Dec 2016, 15:56:01 UTC - in response to Message 28208.  

Hmm, no, that was a 400 not a 404. Guess I messed up.
ID: 28209 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 56,545
Message 28213 - Posted: 22 Dec 2016, 18:00:48 UTC - in response to Message 28209.  

Hmm, no, that was a 400 not a 404. Guess I messed up.

Did you hit "return" twice after the last input line?

GET http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch/vc/sbin/bootstrap HTTP/1.0<hit return>
<hit return again>


I can´t reproduce the 400.
ID: 28213 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : Missing heartbeat file errors


©2024 CERN