Message boards : Number crunching : Tips for optimizing BOINC file transfers for LHC@home
Message board moderation

To post messages, you must log in.

AuthorMessage
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 34694 - Posted: 18 Mar 2018, 5:09:22 UTC
Last modified: 18 Mar 2018, 6:01:43 UTC

I noticed that I used to have computing errors and times when my internet connection would fail. I then noticed that BOINC was trying to download two or more very large files at the same time from LHC@home. I used to have to suspend my VirtualBox LHC@home tasks to save them from failing with a compute error during this time until the downloads completed, and those downloads proceeded at a really slow rate. Those two downloads and the other tasks that my router have to manage (e.g. providing a MoCA connection to the TV's DVR) were apparently maxing out my router's CPU or routing hardware. I found out that limiting BOINC to one file transfer at a time solved the issues of temporarily preventing any other Internet activity, compute errors, and slow file transfers for the two large file transfers BOINC is performing.

This can be done by changing two lines in BOINC's cc_config.xml file, which is found in C:\ProgramData\BOINC for most Windows Vista and later computers. I changed the line "<max_file_xfers>8</max_file_xfers>" to "<max_file_xfers>1</max_file_xfers>". I then changed "<max_file_xfers_per_project>2</max_file_xfers_per_project>" to "<max_file_xfers_per_project>1</max_file_xfers_per_project>".

This caused these effects:

  • Large file transfers in BOINC sped up immensely. I am guessing that having only one large transfer going on allowed the routing data for the single large high throughput file transfer to stay in my router's CPU's cache instead of constantly being pushed to DRAM, which is much slower than the cache.
  • Transfers of multiple small files slowed down, because BOINC will only attempt one file transfer per second instead of two per second default.
  • Computing errors are eliminated because the low throughput connections each of the LHC@home virtual machines are not squeezed out by two high throughput file transfers running at the same time. The router can finally route these while one high throughput file transfer is going on.
  • The temporary internet outages caused by multiple large file transfers maxing out the router's CPU ceased.



These tips are most helpful if your ISP mandates a specific router for your service and you have no option to use your own. If you can use your own, then try that first.

ID: 34694 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 116
Credit: 11,066,721
RAC: 4,727
Message 34700 - Posted: 18 Mar 2018, 21:03:47 UTC - in response to Message 34694.  

Excellent, anything that cuts down the "fake failures" is a good idea - many thanks.
It might also help with the limited number of simultaneous connections provided by the router, presumably limited by the size of the router's RAM. For older routers this can be quite low (and difficult to check)
Thanks again.
ID: 34700 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 34702 - Posted: 19 Mar 2018, 5:34:02 UTC - in response to Message 34694.  

The line "Computing errors are eliminated because the low throughput connections each of the LHC@home virtual machines are not squeezed out by two high throughput file transfers running at the same time. The router can finally route these while one high throughput file transfer is going on." should read "Computing errors are eliminated because the low throughput connections each of the LHC@home virtual machines generate are not squeezed out by two high throughput file transfers running at the same time. The router can finally route these while one high throughput file transfer is going on." However, BOINC's time limit for editing this post is expired.
ID: 34702 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 34703 - Posted: 19 Mar 2018, 5:46:34 UTC - in response to Message 34700.  
Last modified: 19 Mar 2018, 5:47:11 UTC

In my ISP-mandated router, I think that what is going on is that it was designed for lower speed WAN connections, which it can handle fine. However, my ISP decided to offer gigabit, and that is too much for it to handle multiple high speed connections at once at those speeds. It can handle one high speed connection fine at gigabit speeds, but more than that is too much for it to route at one time. My guess is that the large amount of data it is trying to handle at once when handling two or more high speed connections at once pushed all of the routing data out of the cache and into DRAM, which is really slow.

Some cable modems like those which use Intel and Texas Instruments cable modem chips have a hardware routing table in their packet processing engines which can handle a maximum number of IP connections at once, and exhausting it causes a denial of service because no further connections are able to be created until some of those old connections are torn down. (Intel bought Texas Instruments' cable modem business.) I believe that Broadcom's cable modem chips use software processing, but they perform better than the Intel and TI-based cable modems with their poorly designed hardware processors.

You cannot really know what is going on inside it. It could be poor software which could be fixed with a firmware update. It could be poorly designed hardware. It could be hardware which was designed for one set of requirements, but is now being pushed to handle much higher requirements than it is designed for and therefore performs poorly at those higher requirements in some cases.
ID: 34703 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 55
Credit: 10,223,976
RAC: 309
Message 34705 - Posted: 20 Mar 2018, 1:20:40 UTC

That sounds like issues with a small NAT table that comes with the VZ Actiontec routers for FioS. Too many connections and it trips over itself as it doesn't have enough of a NAT table to keep up with the connections. Even if your ISP requires a certain router, there's nothing stopping you from putting your own router behind it. The only device the ISP router sees is your own router so the NAT table doesn't fill up. This is how I have mine setup. Just assign your router a different IP besides the typical 192.168.1.1 that might come with the ISP router.
ID: 34705 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 35402 - Posted: 31 May 2018, 8:58:37 UTC - in response to Message 34705.  

That would only work if you could put the ISP router in bridge mode. Otherwise, the ISP router will still keep track of all connections because it will see all of the different destinations and different source ports and still perform NAT on each connection.
ID: 35402 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 55
Credit: 10,223,976
RAC: 309
Message 35409 - Posted: 1 Jun 2018, 14:36:23 UTC - in response to Message 35402.  

That would only work if you could put the ISP router in bridge mode. Otherwise, the ISP router will still keep track of all connections because it will see all of the different destinations and different source ports and still perform NAT on each connection.


I still don't have dropped connections with the VZ router not in bridge mode and I've added more connections since when I was using it as the main router.

Shouldn't the VZ router only see one device, the aftermarket router since everything else is behind that NAT?
ID: 35409 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 35423 - Posted: 4 Jun 2018, 4:03:04 UTC - in response to Message 35409.  

It will see one device, but every connection as separate. This is to allow the firewall to block incoming packets except to known connections that originated inside the firewall. It therefore has to keep track of every connection so it knows what packets to allow and route back inside, and to drop everything else.
ID: 35423 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 36733 - Posted: 17 Sep 2018, 7:26:08 UTC

One more benefit I found is that since only one work unit could download at a time, the credential server at LHC@home never gets overwhelmed with requests and will always supply credentials. If too many work units start up at the same time, the credential server could only create credentials for a few of them, causing the rest of the work units starting up at the same time to fail.
ID: 36733 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 46902 - Posted: 16 Jun 2022, 23:43:44 UTC

I should have done this earlier, but I finally found out why my old router was causing this trouble. Turning on IPv6 on the old router apparently disabled hardware acceleration since its hardware acceleration engine apparently only works on IPv4. Turning on IPv6 caused the issue since turning on IPv6 requires a sorting step to be added before any other processing can be done. Since the hardware engine was apparently designed for IPv4 only and apparently must be sent all incoming traffic to perform hardware-accelerated IPv4 without a sorting step, everything was then done in software when IPv6 is enabled. IPv6 is very simple to route in software even at gigabit wire speed, but performing NAT on IPv4 which is part of the routing process in a home IPv4 gateway is very expensive to perform in software, causing these problems. Replacing the router with a newer generation model which can accelerate the sorting and the routing of both IPv4 and IPv6 in hardware solved the problem.
ID: 46902 · Report as offensive     Reply Quote

Message boards : Number crunching : Tips for optimizing BOINC file transfers for LHC@home


©2024 CERN