41) Message boards : Number crunching : Missing heartbeat file errors (Message 28237)
Posted 23 Dec 2016 by Jesse Viviano
Post:
How would I check the registry for remains of an old extension pack in the registry?
The access rights of the BOINC folders look fine.
The folders are in the same partition.
42) Message boards : Number crunching : Missing heartbeat file errors (Message 28233)
Posted 23 Dec 2016 by Jesse Viviano
Post:
I had another possible hypothesis: After my VDSL was swapped out with gigabit fiber, I noticed that applications that tried to geolocate my router's public IP address often failed. I also noticed some posts at https://lhcathome.cern.ch/vLHCathome/forum_thread.php?id=1933 and https://lhcathome.cern.ch/vLHCathome/forum_thread.php?id=1934 which try to direct users to either CERN or FNAL depending on location. Could this be what is causing my failure?
43) Message boards : Number crunching : Missing heartbeat file errors (Message 28230)
Posted 23 Dec 2016 by Jesse Viviano
Post:
I was connected to the HTTPS URL.
44) Message boards : Number crunching : Missing heartbeat file errors (Message 28215)
Posted 23 Dec 2016 by Jesse Viviano
Post:
Has anyone like a project administrator, developer, or scientist who has a computer that can successfully process work units tried a project reset after letting that computer's LHC@home queue drain completely? If there is a missing file in the VM or at the CVMFS, I am guessing that work units will begin to fail after such a reset because the missing file that was in the machine's cache before the reset file will be wiped by the reset, and such an administrator, developer, or scientist could examine the problem and start debugging it. If work units continue to process successfully, there might be something on the endpoints that this project is incompatible with.
45) Message boards : Number crunching : Missing heartbeat file errors (Message 28200)
Posted 22 Dec 2016 by Jesse Viviano
Post:
I just did a complete uninstall and reinstall of VirtualBox, and that did not solve my problem.
46) Message boards : Number crunching : Missing heartbeat file errors (Message 28199)
Posted 22 Dec 2016 by Jesse Viviano
Post:
I briefly reverted to the Wi-Fi connection, and that did not solve the problem.
47) Message boards : Number crunching : Missing heartbeat file errors (Message 28198)
Posted 22 Dec 2016 by Jesse Viviano
Post:
I doubt that this is due to the installation of VirtualBox because I do run ATLAS@home, another project whose work units requires network connectivity instead of being properly self-contained like Cosmology@home, and its work units properly process even on my family's new gigabit fiber connection and my computer's gigabit Ethernet connection to that fiber connection.
48) Message boards : Number crunching : Missing heartbeat file errors (Message 28197)
Posted 21 Dec 2016 by Jesse Viviano
Post:
I just upgraded from VirtualBox 5.1.10 to 5.1.12 today to see if that was the issue. I have run several successful tasks last week on 5.1.10. The upgrade did not solve the issue.
49) Message boards : Number crunching : Missing heartbeat file errors (Message 28195)
Posted 21 Dec 2016 by Jesse Viviano
Post:
I am running Windows 10, and since I was able to see the message at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4052&postid=28184 show that HTTP connections are valid, I typed http://lhchomeproxy.cern.ch:3125/ into my browser and it connected and got an HTTP 400 error, showing that I was able to connect to the server.
50) Message boards : Number crunching : Missing heartbeat file errors (Message 28190)
Posted 21 Dec 2016 by Jesse Viviano
Post:
I have been repeatedly resetting the project to no avail. I still get errors so I have temporarily detached until I see signs that this has been resolved to avoid damaging the project further.
51) Message boards : Number crunching : Missing heartbeat file errors (Message 28185)
Posted 21 Dec 2016 by Jesse Viviano
Post:
I typed "ping grid.cern.ch" into PowerShell. The IP address it found was 198.105.244.130, and all pings were lost.

Typing "nslookup grid.cern.ch" returns the following text:
Server: dsldevice.attlocal.net
Address: 192.168.1.254

Non-authoritative answer:
Name: grid.cern.ch
Addresses: 198.105.244.130
198.105.254.130


I was able to connect to http://lhchomeproxy.cern.ch:3125/ using my browser and telnet to lhchomeproxy.cern.ch port 3125 as well.

Typing nslookup lhchomeproxy.cern.ch results in the following text:
Server: dsldevice.attlocal.net
Address: 192.168.1.254

Non-authoritative answer:
Name: cmsextproxy.cern.ch
Addresses: 128.142.168.203
128.142.168.202
Aliases: lhchomeproxy.cern.ch


I tried to investigate to see if there are any DNS problems. I got a report at http://www.dnsstuff.com/tools#dnsReport|type=domain&&value=lhchomeproxy.cern.ch that had an interesting warning that could be causing DNS to be slower than needed. The warning is on a test named "NS matches parent list". I wonder if the delay caused by this potential problem might be causing our error.

I found out where the bad DNS response came from for grid.cern.ch: AT&T uses DNS hijacking for its DNS Error Assist service to try to help web browsers go to the correct web page when they mistype a web address, but this breaks standard behavior that relies on having a proper DNS error when something is wrong. I am shutting that down on our family's account.
52) Message boards : Number crunching : Missing heartbeat file errors (Message 28182)
Posted 21 Dec 2016 by Jesse Viviano
Post:
I just tried to ping grid.cern.ch, and all of the pings failed. Is there a firewall in between my computer and that server dropping pings? Is grid.cern.ch down? DNS was able to resolve that server's IP address as 198.105.244.130. I am trying to see if there is a network issue between my computer and the server.
53) Message boards : Number crunching : Missing heartbeat file errors (Message 28179)
Posted 21 Dec 2016 by Jesse Viviano
Post:
You are right. I had transcribed the error wrong. It is as below:
Starting libvirtd daemon: [ OK ]
/etc/rc3.d/S99local: line 1: /cvmfs/grid.cern.ch/vc/sbin/bootstrap: No such file or directory
bootlogd: no process killed
54) Message boards : Number crunching : Missing heartbeat file errors (Message 28165)
Posted 20 Dec 2016 by Jesse Viviano
Post:
To clear things up, I am getting messages like below in the VM console for the work units I have tried to run:
Starting libvirtd daemon: [ OK ]
/etc/rc3.d/S99local: line 1: cvmfs/grid.cern.ch/vc/sbin/bootstrap: No such file or directory
bootlogd: no process killed

I think that the VM is trying to read a file that does not exist or is misnamed either itself or in the program that attempts to read it. This file apparently needs to be read before the VM can attempt to contact Condor. Since it cannot be read, the tasks end in a compute error.
55) Message boards : Number crunching : Missing heartbeat file errors (Message 28161)
Posted 20 Dec 2016 by Jesse Viviano
Post:
I am getting to the line below as seen in your first screenshot:
Starting libvirtd daemon: [ OK ]

Just after that, I get the error message below:
/etc/rc3.d/S99local: line 1: cvmfs/grid.cern.ch/vc/sbin/bootstrap: No such file or directory
bootlogd: no process killed

I think that the VM is trying to read a file that does not exist or is misnamed either itself or in the program that attempts to read it. This file apparently needs to be read before the VM can attempt to contact Condor. Since it cannot be read, the tasks end in a compute error.
56) Message boards : Number crunching : Missing heartbeat file errors (Message 28153)
Posted 19 Dec 2016 by Jesse Viviano
Post:
I am noticing errors with the following error message in my VM consoles:
/etc/rc3.d/S99local: line 1: cvmfs/grid.cern.ch/vc/sbin/bootstrap: No such file or directory
bootlogd: no process killed

I am starting to suspect that we might have a run of bad work units with missing files that just coincidentally happened while my Internet service was being replaced.
57) Message boards : Number crunching : Missing heartbeat file errors (Message 28152)
Posted 19 Dec 2016 by Jesse Viviano
Post:
What do you want me to look for in the VM console? I do not know how to interpret it.
58) Message boards : Number crunching : Missing heartbeat file errors (Message 28151)
Posted 19 Dec 2016 by Jesse Viviano
Post:
I am using the same ISP, AT&T. My family has switched the connection method to said ISP from U-Verse VDSL to U-Verse GigaPower fiber and shut down the U-Verse overcompressed IPTV service so that no longer competes for throughput. (TV on the antenna is much clearer.)
As for ATLAS@home tasks, they still work well on my machine now.
59) Message boards : Number crunching : Missing heartbeat file errors (Message 28142)
Posted 19 Dec 2016 by Jesse Viviano
Post:
I do not know why my computer has been generating missing heartbeat file errors lately. Tasks take around 10 minutes and then fail due to missing heartbeat files. The things that have changed are my internet connection from a VDSL service which my computer connected to via Wi-Fi that was vulnerable to a microwave oven occasionally wrecking the connection between the VMs and the server to fiber optic gigabit service that my computer connects to via Gigabit Ethernet which is immune to the microwave oven. I do not know how these changes are relevant to the VMs at all. I fear that my computer is therefore damaging the project with its errors and therefore have decided to detach until something changes. I tried to connect to the older vLHC@home project (where I have an account) and still got the same error in order to see if there was something exclusive to the LHC@home project or if it appeared across both projects.
60) Message boards : LHCb Application : Condor exited after 608s without running a job (Message 27974)
Posted 27 Nov 2016 by Jesse Viviano
Post:
This also happens if your network connection is unreliable. My family thinks that Ethernet is so yesterday and would rather place the network router near the center of the house and cover the house with Wi-Fi. There are several problems with Wi-Fi. 2.4 GHz Wi-Fi does penetrate walls, but it is wrecked by poorly-shielded microwave ovens. 5.8 GHz Wi-Fi does not penetrate walls well, and my computer has too many of those walls between it and the router for 5.8 GHz Wi-Fi to run at an acceptable speed. BOINC handles that unreliability well, but LHC@home does not. This is why I put an item in the wish list that work units would be self-contained instead of having to fetch something from a server in the middle of the work unit.


Previous 20 · Next 20


©2024 CERN