Message boards : Number crunching : Downloads have stalled
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,750,959
RAC: 122,106
Message 35806 - Posted: 7 Jul 2018, 8:51:04 UTC - in response to Message 35805.  

at the bottom line: it would be great if back at CERN they would find out soonest what is the cause for the download problem
ID: 35806 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,636,773
RAC: 15,996
Message 35808 - Posted: 7 Jul 2018, 12:34:34 UTC - in response to Message 35805.  

To be clear, I meant suspending network connection within Boinc Manager (or BoincTasks in my case) i.e. doing the same as the Bronco's script. If you disconnect the network like switching off your modem or something similar, then your description is right. Anyway due care should be taken by volunteers.
ID: 35808 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,569,815
RAC: 10,128
Message 35809 - Posted: 7 Jul 2018, 13:07:57 UTC
Last modified: 7 Jul 2018, 13:09:07 UTC

This is the situation 20 hours later:



The client is sitting there idle and waiting for downloads to finish


Supporting BOINC, a great concept !
ID: 35809 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35810 - Posted: 7 Jul 2018, 15:59:53 UTC - in response to Message 35802.  

Depending on the project mix that runs on the host, this workaround may have unwanted side effects.

As soon as the network activity is suspended, the BOINC client will start other tasks.
The LHC VMs will typically stay in RAM and - if there is not enough RAM - higher swapping activity may be encountered.
The latter is suspect to cause timing problems once the VMs will resume (especially: resume concurrently) and may result in a watchdog error.

In short:
Volunteers may be aware of other errors.


I appreciate advice from volunteers with more experience but I haven't had any unwanted side effects here so far, just ATLAS tasks downloading without me babysitting them.

I confess that I don't have those hosts attached to any other projects. They are both crunching ATLAS and Theory tasks. At the moment both are working a Theory task but they both have cached ATLAS tasks and have had for more than a day. The event logs show network activity suspending and resuming but no abnormal task switching.
ID: 35810 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35811 - Posted: 7 Jul 2018, 16:16:53 UTC - in response to Message 35808.  
Last modified: 7 Jul 2018, 16:17:40 UTC

To be clear, I meant suspending network connection within Boinc Manager (or BoincTasks in my case) i.e. doing the same as the Bronco's script.

Good point. Earlier in this thread manually suspending/resuming network activity was touted as the way to nurse troublesome ATLAS downloads to completion. There was no mention of undesirable side effects back then. Now that the method is automated it's a concern?


Anyway due care should be taken by volunteers.

Yes. The fact that it works for me doesn't guarantee it will work for everybody. YMMV.
ID: 35811 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,636,773
RAC: 15,996
Message 35819 - Posted: 7 Jul 2018, 21:37:32 UTC

I have to apologize being hasty and wrong in my comments about computezrmle's ideas. I had a chance to observe what actually happens when I suspend the network connection in Boinc while one Atlas task was being downloaded and one was running at the same time. The Atlas task was paused and 3 new sixtrack tasks were started instead. When I resumed the network connection the VM resumed as well and it took about a minute for Boinc to pause those 3 additional sixtrack tasks. So for a while I had more CPU cores in action than what was actually available, but the situation was resolved soon and no errors followed.
ID: 35819 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35822 - Posted: 8 Jul 2018, 0:10:13 UTC - in response to Message 35819.  

I had a chance to observe what actually happens when I suspend the network connection in Boinc...


Me too. I was observing the wrong thing. I was watching the event log which showed no signs of tasks pausing. However in BOINC manager on the tasks tab the status for VM tasks switches to "Waiting for network". Yes, computezrmie, take a bow because you were right :)

WARNING: The script works well enough for me because I am running only VM tasks. Anybody running non-VM tasks might experience undesirable task switching and crashed tasks.

Maybe a more sophisticated kludge is in order, something like...
LOOP:  are ATLAS tasks are downloading?
     - if no:
           - sleep for a while
           - goto LOOP
     - if yes:
           - suspend computing
           - suspend network activity for 10 seconds
           - resume network activity
           - resume computing
           - sleep 10 minutes
           - goto 1)
  
ID: 35822 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 35869 - Posted: 12 Jul 2018, 8:06:44 UTC - in response to Message 35794.  

Edit: Will testing a limiting of the Download-speed in Boinc-preferences this weekend.

Two PC are limited to 3500 kbps in Boinc-Preferences.
They need 2 min instead of 1 min for the download.
Max. speed is 7500 kbps from ISP.
So, this function is ok if there is trouble with downloading in the future.
ID: 35869 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,569,815
RAC: 10,128
Message 35930 - Posted: 16 Jul 2018, 7:13:21 UTC

Since several days my downloads are working fine again ...


Supporting BOINC, a great concept !
ID: 35930 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,788
RAC: 143,603
Message 35933 - Posted: 16 Jul 2018, 8:01:39 UTC

+1k
ID: 35933 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35943 - Posted: 16 Jul 2018, 16:01:09 UTC

Not mine. They were OK for a few days but now the problem has returned.
ID: 35943 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Downloads have stalled


©2024 CERN