Message boards :
ATLAS application :
Download failures
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
Send message Joined: 15 Jul 05 Posts: 248 Credit: 5,974,599 RAC: 0 |
From the BOINC point of view it seems ok, my client downloads and uploads ATLAS files: 01-Aug-2017 08:33:12 [LHC@home] Started download of 5W5KDmokewqnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmnFMKDm0GAUfn_input.tar.gz 01-Aug-2017 08:33:12 [LHC@home] Finished upload of w-c8_n20_lhc2016_40_MD-105-16-476-2.5-0.9157__24__s__64.31_59.32__1_2__6__13.5_1_sixvf_boinc5378_1_0 01-Aug-2017 08:33:13 [LHC@home] Finished download of 5W5KDmokewqnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmnFMKDm0GAUfn_input.tar.gz 01-Aug-2017 08:33:13 [LHC@home] Started download of rte_5W5KDmokewqnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmnFMKDm0GAUfn.tar.gz 01-Aug-2017 08:33:14 [LHC@home] Finished download of rte_5W5KDmokewqnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmnFMKDm0GAUfn.tar.gz 01-Aug-2017 08:33:14 [LHC@home] Started download of boinc_job_script.xp0zEy 01-Aug-2017 08:33:15 [LHC@home] Finished download of boinc_job_script.xp0zEy We've notified our ATLAS colleagues about the possible application/Frontier problem. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
Thank you Nils. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,509,401 RAC: 31,506 |
From the BOINC point of view it seems ok, my client downloads and uploads ATLAS files this seems to be the strange thing: some people do NOT have this problem, others do (here, also ping yields a timeout, again). We've notified our ATLAS colleagues about the possible application/Frontier problem. so let's wait and see what they can/will find out. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I now run ATLAS on two CPU cores at a time with no problems for either downloading work units or errors running them. I think CERN is sort of ignoring the single-CPU users, to encourage the multi-core version. It is supposed to save on bandwidth, etc. I liked the single-core version for efficiency, but two cores will work for me. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Cern-IT found the solution yesterday morning for this problem and they find this also today. So take a break. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
Erich56 wrote: ... also ping yields a timeout, again ... Hi Erich, As already stated by Yeti, a failing ping is not a criterion to see if the server is running or not. Ping uses the ICMP protocol and may be dropped/rejected by any of the hops between your host and the target system. The file transfer done by the projects is (mostly) done via HTTP. To check the availability of a service on a distinct server the VMs typically use a command like nc -z -v -w 5 lhchomeproxy.cern.ch 80 and an answer like Connection to lhchomeproxy.cern.ch 80 port [tcp/http] succeeded! shows that the service is up and also that the route to the server is not blocked. Unfortunately this is only the network part of a connection. Some services are protected by special credentials on a higher comunication level and sometimes also cause failures. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
It seems that the 120MB files should be downloaded from the boincai04 server and that server is not reachable for me. Maybe for Nils cause he's on CERN's LAN. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,834,089 RAC: 16,184 |
Yes the boincai04 server is down once again so you probably should just try one of the other project tasks for now. Volunteer Mad Scientist For Life |
Send message Joined: 23 Jun 14 Posts: 12 Credit: 6,323,760,514 RAC: 1,767,677 |
Hi, The server which is hosting this file was down, that is why there was a download error.. Now we have brought back the machine, and the file should be available. Cheers! Ever since the day that there was the issue after the cleanup of old files, I have been experiencing the same issues. LHC was running Atlas fine for a long time, where it would download 4 tasks and run them through without issue.. What I am seeing now (and since the day of the server cleanup) is my machine will attempt to download files for tasks, and get stuck retrying on a few for several hours. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,834,089 RAC: 16,184 |
Hi, Server error: feeder not running Server can't open log file (../log_boincai04/scheduler.log) (did you happen to check your pm over at TEST lately? ) |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
Problems with boincai04 affect the download of the EVNT files. It's good to hear that they are solved. Instead there may be additional problems independent from boincai04 as the heavy use of the spare server ccfrontier.in2p3.fr and the missing output at console 2 point out. Has this also been checked/solved? |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,509,401 RAC: 31,506 |
As already stated by Yeti, a failing ping is not a criterion to see if the server is running or not. well, the experience I have made in the past days was that ping worked fine when the server was alive, and that ping showed a timeout when the server was down. Maybe it was rather a coincidence that it worked that way. I personally am NOT a network specialist at all. To check the availability of a service on a distinct server the VMs typically use a command like I now tried this, but I got the message that "nc" is a wrong command. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
Erich56 wrote: I now tried this, but I got the message that "nc" is a wrong command. Well, it's a command that was originally written for unix (short form of netcat) and therefore isn't available by default on windows. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,509,401 RAC: 31,506 |
netcat "netcat" - sounds sweet :-) |
Send message Joined: 23 Jun 14 Posts: 12 Credit: 6,323,760,514 RAC: 1,767,677 |
Yes, there was a permission issue with the scheduler file on boincai04, it is fixed now.. Cheers! Hi, |
Send message Joined: 23 Jun 14 Posts: 12 Credit: 6,323,760,514 RAC: 1,767,677 |
Just to summarize the cause of the failure for some files with ATLAS@home: 1. some of the input files are stored on a test server boincai04, and it was down yesterday due to heavy load. We modified the job submission script, so all the input files are stored on more powerful and reliable servers, which should prevent this from happening again. 2. For people who still attach the hosts to the Atlas-test project, the server (boincai04) was stuck a few times in the past a few days due to the heavy workload on it as a tiny machine. Now we split the workload on different machines, and the boincai04 machine still dispatches a small amount of test jobs.. Cheers! |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,509,401 RAC: 31,506 |
Thanks, Wenjing, for the update; have a nice day :-) |
Send message Joined: 28 Sep 04 Posts: 728 Credit: 49,053,473 RAC: 27,046 |
Just got a download error for an Atlas task: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=76058390 like my wingman did. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Since about one week, the download-file of Atlas (200MByte) is dropping very slow. The Counter of the network starts with for example 100 kps and reduced up to Zero. It need about 1 hour instead of one minute regulary. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,509,401 RAC: 31,506 |
Since about one week, the download-file of Atlas (200MByte) is dropping very slow. hm, that's strange. Here, all downloads run with same fast speed as ever before (i.e. in about one minute). |
©2024 CERN