Message boards : ATLAS application : Uploading stuck
Message board moderation

To post messages, you must log in.

AuthorMessage
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,855,508
RAC: 1,330
Message 40994 - Posted: 18 Dec 2019, 2:01:14 UTC

For the last few days I have upload transfers stuck. Two ATLAS tasks have been trying to upload for at least two days now. The upload file size is only 221 bytes. I have the exact same problem on another computer, so it's probably not just a solution of turning it off and on again. Not having the problem uploading and reporting to other Boinc projects, so it's unique to LHC@home. Any suggestions?
ID: 40994 · Report as offensive     Reply Quote
lazlo_vii
Avatar

Send message
Joined: 20 Nov 19
Posts: 21
Credit: 1,074,330
RAC: 3
Message 40995 - Posted: 18 Dec 2019, 4:01:41 UTC - in response to Message 40994.  
Last modified: 18 Dec 2019, 4:07:29 UTC

First, I would look at your /etc/boinc-client/cc_config.xml and double check the network settings. I am not saying "It isn't plugged in!" but that should be the first question you answer for yourself. If all is good in your config file I would try to manually update the project from the command line in one X terminal while watching the boinc-client messages in another. First open a terminal on (or to) the host and issue:

watch -n1 boinccmd --get_messages


That will terminal will refresh messages from the boinc-client service until you hit ctrl+c to kill watch.

Open a second terminal on (or to) the same host and issue which ever one these two commands matches your configuration:

boinccmd --project https://lhcathome.cern.ch/lhcathome/ update

or
boinccmd --host localhost --passwd <your_password_for_remote_access> --project https://lhcathome.cern.ch/lhcathome/ update


Switch back to the first terminal and read what boinc-client says about updating.

If that doesn't give you useful information you can try looking at /var/log/syslog and reading man nc, man netstat, and man boinccmd for more clues. Router logs might be useful to you as well.

EDIT: The updating of the project and reading boinc-client's messages can be done easily from the GUI, but what fun is that?
ID: 40995 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,855,508
RAC: 1,330
Message 40998 - Posted: 18 Dec 2019, 8:43:25 UTC - in response to Message 40995.  
Last modified: 18 Dec 2019, 8:46:04 UTC

Thank you for those suggestions. I have done another Update request. In the Boinc manager GUI, because using terminals seemed overkill. This is recent output from the event logl

Wed 18 Dec 2019 20:44:57 NZDT | LHC@home | Started upload of gCxMDmN9izvn9Rq4apoT9bVoABFKDmABFKDmt4SaDmABFKDmY4ACQo_0_r369045851_ATLAS_hits
Wed 18 Dec 2019 20:44:57 NZDT | LHC@home | Started upload of kspKDmb7wzvnsSi4apGgGQJmABFKDmABFKDmvDwVDmABFKDmzd6Ztn_0_r115146122_ATLAS_hits
Wed 18 Dec 2019 20:46:01 NZDT | LHC@home | Backing off 03:14:20 on upload of gCxMDmN9izvn9Rq4apoT9bVoABFKDmABFKDmt4SaDmABFKDmY4ACQo_0_r369045851_ATLAS_hits
Wed 18 Dec 2019 20:46:01 NZDT | LHC@home | Backing off 04:27:48 on upload of kspKDmb7wzvnsSi4apGgGQJmABFKDmABFKDmvDwVDmABFKDmzd6Ztn_0_r115146122_ATLAS_hits
Wed 18 Dec 2019 21:20:59 NZDT | Universe@Home | Sending scheduler request: Requested by project.
Wed 18 Dec 2019 21:20:59 NZDT | Universe@Home | Requesting new tasks for CPU
Wed 18 Dec 2019 21:21:02 NZDT | Universe@Home | Scheduler request completed: got 1 new tasks
Wed 18 Dec 2019 21:21:04 NZDT | Universe@Home | Started download of universe_bh2_190723_292_448744552_20000_1-999999_745100
Wed 18 Dec 2019 21:21:08 NZDT | Universe@Home | Finished download of universe_bh2_190723_292_448744552_20000_1-999999_745100
Wed 18 Dec 2019 21:24:43 NZDT |  | Suspending GPU computation - computer is in use
Wed 18 Dec 2019 21:35:26 NZDT | LHC@home | project resumed by user
Wed 18 Dec 2019 21:35:29 NZDT | LHC@home | Sending scheduler request: Requested by project.
Wed 18 Dec 2019 21:35:29 NZDT | LHC@home | Requesting new tasks for CPU
Wed 18 Dec 2019 21:35:33 NZDT | LHC@home | update requested by user
Wed 18 Dec 2019 21:35:33 NZDT | LHC@home | Scheduler request completed: got 1 new tasks
Wed 18 Dec 2019 21:35:35 NZDT | LHC@home | Started download of workspace1_hl14_OnErrors_OnOct_NoBB_col_B1_radial_dp_0.00003__1__s__62.31_60.32__13_13.1__6__84_1_sixvf_boinc11654.zip
Wed 18 Dec 2019 21:35:38 NZDT | LHC@home | Finished download of workspace1_hl14_OnErrors_OnOct_NoBB_col_B1_radial_dp_0.00003__1__s__62.31_60.32__13_13.1__6__84_1_sixvf_boinc11654.zip


The top lines represent the failing upload attempt. As you can see, I have no trouble connecting. I even downloaded a new SixTrack task for my effort. However, those two completed ATLAS tasks are just endlessly retrying their upload. Identical problem on another laptop, so it is not machine dependent.

I will let it run that SixTrack task and see if that can upload.
ID: 40998 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,223,877
RAC: 95,641
Message 40999 - Posted: 18 Dec 2019, 8:57:14 UTC - in response to Message 40998.  

Just to be sure your basic network connection works.

You may try:
nc -z -v -w 5 lhcathome-upload.cern.ch 80
Run it from both of your hosts using your BOINC client's account.

Output should look like this:
Connection to lhcathome-upload.cern.ch 80 port [tcp/http] succeeded!


Post any other output here for analysis.
If the command succeeds you may try a reboot.
ID: 40999 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,855,508
RAC: 1,330
Message 41000 - Posted: 18 Dec 2019, 9:58:13 UTC - in response to Message 40999.  

Just to be sure your basic network connection works.


Thank you for suggestion. I tried that command. Got your expected output, so no problem there.
I can also note that the SixTrack I got on my last update successfully completed, uploaded and reported, so it's only the ATLAS tasks that are stuck.
ID: 41000 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,855,508
RAC: 1,330
Message 41001 - Posted: 18 Dec 2019, 10:11:17 UTC - in response to Message 40999.  

For good measure I just did a reboot as well. Event log from restarting BOINC afterwards (with some irrelevant lines removed) is as follows.

Wed 18 Dec 2019 23:02:53 NZDT |  | Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu
Wed 18 Dec 2019 23:02:53 NZDT |  | log flags: file_xfer, sched_ops, task
Wed 18 Dec 2019 23:02:53 NZDT |  | Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Wed 18 Dec 2019 23:02:53 NZDT |  | Data directory: /var/lib/boinc-client
Wed 18 Dec 2019 23:02:53 NZDT |  | CUDA: NVIDIA GPU 0: GeForce GTX 1050 (driver version 390.11, CUDA version 9.1, compute capability 6.1, 1999MB, 1744MB available, 1960 GFLOPS peak)
Wed 18 Dec 2019 23:02:53 NZDT |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1050 (driver version 390.116, device version OpenCL 1.2 CUDA, 1999MB, 1744MB available, 1960 GFLOPS peak)
Wed 18 Dec 2019 23:02:53 NZDT |  | [libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1
Wed 18 Dec 2019 23:02:53 NZDT |  | Host name: ZARX1804
Wed 18 Dec 2019 23:02:53 NZDT |  | Processor: 4 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]
Wed 18 Dec 2019 23:02:53 NZDT |  | OS: Linux Ubuntu: Ubuntu 18.04.3 LTS [5.0.0-37-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]
Wed 18 Dec 2019 23:02:53 NZDT |  | Memory: 15.55 GB physical, 0 bytes virtual
Wed 18 Dec 2019 23:02:53 NZDT |  | Disk: 38.20 GB total, 9.35 GB free
Wed 18 Dec 2019 23:02:53 NZDT |  | Local time is UTC +13 hours
Wed 18 Dec 2019 23:02:53 NZDT |  | VirtualBox version: 6.0.8r130520
Wed 18 Dec 2019 23:02:53 NZDT |  | Config: GUI RPCs allowed from:
Wed 18 Dec 2019 23:02:53 NZDT |  | Last benchmark was 34 days 03:00:57 ago
Wed 18 Dec 2019 23:02:53 NZDT | Asteroids@home | URL http://asteroidsathome.net/boinc/; Computer ID 532806; resource share 25
Wed 18 Dec 2019 23:02:53 NZDT | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12751963; resource share 100
Wed 18 Dec 2019 23:02:53 NZDT | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10543635; resource share 300
Wed 18 Dec 2019 23:02:53 NZDT | Rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 3394589; resource share 100
Wed 18 Dec 2019 23:02:53 NZDT | Universe@Home | URL https://universeathome.pl/universe/; Computer ID 489260; resource share 100
Wed 18 Dec 2019 23:02:53 NZDT |  | Running CPU benchmarks
Wed 18 Dec 2019 23:02:53 NZDT |  | Suspending computation - CPU benchmarks in progress
Wed 18 Dec 2019 23:03:24 NZDT |  | Benchmark results:
Wed 18 Dec 2019 23:03:24 NZDT |  | Number of CPUs: 2
Wed 18 Dec 2019 23:03:24 NZDT |  | 4343 floating point MIPS (Whetstone) per CPU
Wed 18 Dec 2019 23:03:24 NZDT |  | 126625 integer MIPS (Dhrystone) per CPU
Wed 18 Dec 2019 23:03:25 NZDT |  | Suspending GPU computation - computer is in use
Wed 18 Dec 2019 23:03:40 NZDT | LHC@home | Started upload of gCxMDmN9izvn9Rq4apoT9bVoABFKDmABFKDmt4SaDmABFKDmY4ACQo_0_r369045851_ATLAS_hits
Wed 18 Dec 2019 23:03:59 NZDT | Rosetta@home | project resumed by user
Wed 18 Dec 2019 23:04:12 NZDT | Universe@Home | project resumed by user
Wed 18 Dec 2019 23:04:17 NZDT | Universe@Home | work fetch resumed by user
Wed 18 Dec 2019 23:04:17 NZDT | Rosetta@home | work fetch resumed by user
Wed 18 Dec 2019 23:04:44 NZDT | LHC@home | Backing off 04:49:32 on upload of gCxMDmN9izvn9Rq4apoT9bVoABFKDmABFKDmt4SaDmABFKDmY4ACQo_0_r369045851_ATLAS_hits
Wed 18 Dec 2019 23:05:24 NZDT | LHC@home | Started upload of kspKDmb7wzvnsSi4apGgGQJmABFKDmABFKDmvDwVDmABFKDmzd6Ztn_0_r115146122_ATLAS_hits
Wed 18 Dec 2019 23:06:26 NZDT | LHC@home | Backing off 03:30:16 on upload of kspKDmb7wzvnsSi4apGgGQJmABFKDmABFKDmvDwVDmABFKDmzd6Ztn_0_r115146122_ATLAS_hits


So the ATLAS files are still receiving a project backoff when they try to upload, while everything else uploads fine.
ID: 41001 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 875
Credit: 32,389,346
RAC: 46,081
Message 41003 - Posted: 18 Dec 2019, 12:27:04 UTC

What can be is, that the Atlas-Server lost your task from the Server-Side.
You can see on your hosts, that some Atlas-tasks you want to upload, are finished from a other Computer.
It's a small window from three or four days, than other User get the same Atlas-tasks also.
Than it is hard, but the best is to abort your tasks.
Would control it, for the next time, because you have Atlas-Tasks which are finished correct.
ID: 41003 · Report as offensive     Reply Quote

Message boards : ATLAS application : Uploading stuck


©2020 CERN