Message boards : Number crunching : Downloads have stalled
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
BelgianEnthousiast

Send message
Joined: 5 Apr 15
Posts: 18
Credit: 5,910,849
RAC: 0
Message 35671 - Posted: 28 Jun 2018, 14:39:37 UTC - in response to Message 35670.  

I'm sorry Nils, but this is by no means resolved.

I'm still living the problem and the fact that it's reproduced at multiple participants means that it is either related to Boinc or VirtualBox.

For information, BOINC version 7.10.2 / 3.0.1
VirtualBox 5.2.12 r 122591

I'm running Windows 10 Pro version 1709, build 16299.431, patched to the latest updates.
Motherboard Asus A-99 vII
CPU Intel Core i7-6850K @3.6 GHz
Memory Corsair 32 GB
GPU's : Asus Strix 1070Ti 8 GB and 1070 (will have 2 Ti's this weekend)

No overclocking is happening.

I'm running for years LHC on CPU and GPUGrid on 2 GPU's with little issues.
Alternatively I also run ClimatePrediction, Rosetta, WorldCommunityGrid, MilkyWay.

I dedicate 8 out of 12 of the virtual cores to Boinc (reserving 2 of 8 to GPUGrid to manage the GPU WU's),
and rest is shared by LHC and WCG, with priority 2000 to LHC and 150 to WCG to run primarily LHC.
Atlas 5 cores taking .... 5 cores and the last core being shared between LHCb, Theory, CMS and Sixtracks.

As you can see from my stats probably, this has been running quite well, racking up between 5.000 and 10.000 credits
per day.
I only have troubles up-and downloading LHC WU's.

What I did notice, is that when I stop BOINC (after suspending active WU's of course), I exit it and start the
application again, the down & uploads suddenly work again and reach 3.5 Mbit/s speeds, finishing them in
a matter of a few tens of seconds.

Sorry to push but could you please investigate further as this is quite annoying :-S

Many thanks in advance !

K.

[/img]d:\LHC- ATLAS DOWNLOADS.jpg[img][/img]
ID: 35671 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 35672 - Posted: 28 Jun 2018, 14:40:13 UTC
Last modified: 28 Jun 2018, 14:41:48 UTC

Nils,

sorry, but the Atlas-download problems are really existend and have already driven crunchers to other projects. Me too !

It looks as if the problem is on the way outside to the world.

If it would help you, you can contact me and we open a remote-session and you can see the problem live.

If I should make a shot I would guess it is a damaged / faulty Switch. I had one in my company and the behaviour was very similar

Yeti


Supporting BOINC, a great concept !
ID: 35672 · Report as offensive     Reply Quote
Simplex0

Send message
Joined: 26 Aug 05
Posts: 68
Credit: 545,660
RAC: 0
Message 35673 - Posted: 28 Jun 2018, 15:01:50 UTC - in response to Message 35665.  

maeax wrote:
read this message in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4741&postid=35632#35632
Cern-It is working on this problem.


I don't see that they are working on it. They even can't see the problem

Nils Høimyr wrote:
Download and upload times are like usual on our BOINC clients.

Could this be local, or have been a temporary storage issue? As far as we can see from our monitoring data, I/O is normal over the last 7 days.


+1
ID: 35673 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 35674 - Posted: 28 Jun 2018, 15:17:30 UTC - in response to Message 35670.  

Sorry, but as mentioned, on our BOINC clients upload speed is normal.


Its definitely a problem. This just started happening to both downloads and uploads. I have over 20 results trying to upload that keep failing. A week ago I never saw this issue. There is something going on whether you personally see it on your end or not.
ID: 35674 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,128,280
RAC: 105,358
Message 35675 - Posted: 28 Jun 2018, 15:28:35 UTC

Have Win10pro and inside a VM with SL69.
The stalling is only in Windows-Boinc and not in SL69-Boinc on the same Computer (four Computer).
ID: 35675 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,754,072
RAC: 232,912
Message 35676 - Posted: 28 Jun 2018, 15:58:04 UTC

Could be the new theory version has triggered some server issues.
ID: 35676 · Report as offensive     Reply Quote
Simplex0

Send message
Joined: 26 Aug 05
Posts: 68
Credit: 545,660
RAC: 0
Message 35677 - Posted: 28 Jun 2018, 16:01:59 UTC

I hope that the problem with the extremely slow upload speed that many of your crunchers do see, but apparently not the staff of LHC@home, will come to an end but I have given up hope that it will happen anytime soon.
Compared with Rosetta that I am running now I can confirm that the upload speed to Rosetta in my case is 10 times higher\faster than it is on this project.
ID: 35677 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35697 - Posted: 29 Jun 2018, 9:59:02 UTC
Last modified: 29 Jun 2018, 9:59:57 UTC

additionally, ATLAS result upload is still not working properly:

29.06.2018 08:28:40 | LHC@home | Starting task evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0
29.06.2018 10:47:45 | LHC@home | Computation for task evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0 finished
29.06.2018 10:47:47 | LHC@home | Started upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 10:50:04 | LHC@home | Backing off 00:03:16 on upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 10:53:22 | LHC@home | Started upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:02:01 | LHC@home | Backing off 00:07:07 on upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:09:09 | LHC@home | Started upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:11:23 | LHC@home | Backing off 00:12:22 on upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:23:46 | LHC@home | Started upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:26:35 | LHC@home | Backing off 00:19:15 on upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:47:19 | LHC@home | Started upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result
29.06.2018 11:49:47 | LHC@home | Backing off 00:40:19 on upload of evxMDm0ywssnyYickojUe11pABFKDmABFKDmZ0NNDmABFKDmt4oKNo_0_r1830430149_ATLAS_result


maybe the problem lies in the connection from inside CERN to outside CERN since you (the people inside CERN) dont see problems and we (outside CERN) experience problems.
ID: 35697 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,152,472
RAC: 15,698
Message 35701 - Posted: 29 Jun 2018, 14:10:52 UTC

A few notes on the download / upload problems:
- For me it seems that my Windows 10 host at home suffers more of this problem than my Windows 7 host. The win 10 machine has more CPUs and does a lot more work than the other
- The failing downloads (Atlas) never seem to go project backoffs or download pending but linger in state of "downloading" although no data is being transferred. Restarting the network connection gets data flowing again. This behaviour blocks also any sixtrack downloads when available.
ID: 35701 · Report as offensive     Reply Quote
Sid

Send message
Joined: 26 Jul 12
Posts: 18
Credit: 2,456,826
RAC: 0
Message 35707 - Posted: 29 Jun 2018, 18:48:27 UTC

Sorry, but as mentioned, on our BOINC clients upload speed is normal.

Well, can you just try to run boinc task from home?
On your own computer?
ID: 35707 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 35709 - Posted: 30 Jun 2018, 1:18:51 UTC

I'm now sitting with over 30 tasks that cant upload. Same machine is running other projects with no issues.
ID: 35709 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,128,280
RAC: 105,358
Message 35710 - Posted: 30 Jun 2018, 1:56:27 UTC - in response to Message 35709.  

ID: 35710 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 27 Sep 04
Posts: 102
Credit: 7,086,947
RAC: 1,340
Message 35711 - Posted: 30 Jun 2018, 2:31:17 UTC - in response to Message 35710.  

I've tried disabling and re-enabling network activity

That gets the one or two stuck in download limbo kicked loose, but within one or two download files, the same behavior is observed

Downloads start normally,then "stall" where NO DATA is being transferred As a result the detected throughput drops to next t ozero
ID: 35711 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 35714 - Posted: 30 Jun 2018, 13:13:59 UTC - in response to Message 35710.  

Have you tested:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4741&postid=35632#35632


Multiple times. It helps downloads but doesn't matter for uploads...they get 4 - 6 Mb uploaded and stall out. All my other projects have no issues on the same machine.
ID: 35714 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,128,280
RAC: 105,358
Message 35715 - Posted: 30 Jun 2018, 13:25:18 UTC - in response to Message 35714.  

Have it too...
We hope they find a solution next week.
ID: 35715 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,944,101
RAC: 137,246
Message 35758 - Posted: 3 Jul 2018, 8:35:18 UTC

The situation didn't change significantly.

Result size: 123 MB
Di 03 Jul 2018 09:55:58 CEST | LHC@home | Started upload of nG4LDmA5FusnyYickojUe11pABFKDmABFKDmjs9aDmABFKDmsXWiim_0_r328315010_ATLAS_result
Di 03 Jul 2018 10:03:33 CEST | LHC@home | Backing off 00:02:54 on upload of nG4LDmA5FusnyYickojUe11pABFKDmABFKDmjs9aDmABFKDmsXWiim_0_r328315010_ATLAS_result
Di 03 Jul 2018 10:06:28 CEST | LHC@home | Started upload of nG4LDmA5FusnyYickojUe11pABFKDmABFKDmjs9aDmABFKDmsXWiim_0_r328315010_ATLAS_result
Di 03 Jul 2018 10:08:20 CEST | LHC@home | Finished upload of nG4LDmA5FusnyYickojUe11pABFKDmABFKDmjs9aDmABFKDmsXWiim_0_r328315010_ATLAS_result


Result size: 93 MB
Di 03 Jul 2018 10:11:32 CEST | LHC@home | Started upload of TlTLDmeXIusnlyackoJh5iwnABFKDmABFKDmb1qRDmABFKDmer21Lm_0_r312902500_ATLAS_result
Di 03 Jul 2018 10:28:17 CEST | LHC@home | Finished upload of TlTLDmeXIusnlyackoJh5iwnABFKDmABFKDmb1qRDmABFKDmer21Lm_0_r312902500_ATLAS_result
ID: 35758 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,944,101
RAC: 137,246
Message 35760 - Posted: 3 Jul 2018, 14:28:18 UTC

It's getting worse.

Result size: 107 MB
Upload speed: 16 kB/s
Di 03 Jul 2018 14:22:53 CEST | LHC@home | Started upload of G4CMDmIi3tsnlyackoJh5iwnABFKDmABFKDmsJMZDmABFKDmmCrP6m_3_r601011395_ATLAS_result
Di 03 Jul 2018 16:13:12 CEST | LHC@home | Finished upload of G4CMDmIi3tsnlyackoJh5iwnABFKDmABFKDmsJMZDmABFKDmmCrP6m_3_r601011395_ATLAS_result
ID: 35760 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,754,072
RAC: 232,912
Message 35762 - Posted: 3 Jul 2018, 17:38:08 UTC

I just abort them when I get a chance.
ID: 35762 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,385,422
RAC: 102,187
Message 35765 - Posted: 3 Jul 2018, 19:20:08 UTC

I saw these problems only with ATLAS in the past days. Tasks from all the other projects were downloaded with normal speed and without interruption.
ID: 35765 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 35766 - Posted: 4 Jul 2018, 1:16:23 UTC

My downloads are better...some still get stuck and apparently the "retry" option doesn't actual work...it just throws a "error while downloading". Uploads are starting to get through but still taking hours.
ID: 35766 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Downloads have stalled


©2024 CERN