Message boards :
Number crunching :
Tips for optimizing BOINC file transfers for LHC@home
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
I noticed that I used to have computing errors and times when my internet connection would fail. I then noticed that BOINC was trying to download two or more very large files at the same time from LHC@home. I used to have to suspend my VirtualBox LHC@home tasks to save them from failing with a compute error during this time until the downloads completed, and those downloads proceeded at a really slow rate. Those two downloads and the other tasks that my router have to manage (e.g. providing a MoCA connection to the TV's DVR) were apparently maxing out my router's CPU or routing hardware. I found out that limiting BOINC to one file transfer at a time solved the issues of temporarily preventing any other Internet activity, compute errors, and slow file transfers for the two large file transfers BOINC is performing. This can be done by changing two lines in BOINC's cc_config.xml file, which is found in C:\ProgramData\BOINC for most Windows Vista and later computers. I changed the line "<max_file_xfers>8</max_file_xfers>" to "<max_file_xfers>1</max_file_xfers>". I then changed "<max_file_xfers_per_project>2</max_file_xfers_per_project>" to "<max_file_xfers_per_project>1</max_file_xfers_per_project>". This caused these effects:
|
Send message Joined: 6 Sep 08 Posts: 117 Credit: 12,328,487 RAC: 20,294 |
Excellent, anything that cuts down the "fake failures" is a good idea - many thanks. It might also help with the limited number of simultaneous connections provided by the router, presumably limited by the size of the router's RAM. For older routers this can be quite low (and difficult to check) Thanks again. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
The line "Computing errors are eliminated because the low throughput connections each of the LHC@home virtual machines are not squeezed out by two high throughput file transfers running at the same time. The router can finally route these while one high throughput file transfer is going on." should read "Computing errors are eliminated because the low throughput connections each of the LHC@home virtual machines generate are not squeezed out by two high throughput file transfers running at the same time. The router can finally route these while one high throughput file transfer is going on." However, BOINC's time limit for editing this post is expired. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
In my ISP-mandated router, I think that what is going on is that it was designed for lower speed WAN connections, which it can handle fine. However, my ISP decided to offer gigabit, and that is too much for it to handle multiple high speed connections at once at those speeds. It can handle one high speed connection fine at gigabit speeds, but more than that is too much for it to route at one time. My guess is that the large amount of data it is trying to handle at once when handling two or more high speed connections at once pushed all of the routing data out of the cache and into DRAM, which is really slow. Some cable modems like those which use Intel and Texas Instruments cable modem chips have a hardware routing table in their packet processing engines which can handle a maximum number of IP connections at once, and exhausting it causes a denial of service because no further connections are able to be created until some of those old connections are torn down. (Intel bought Texas Instruments' cable modem business.) I believe that Broadcom's cable modem chips use software processing, but they perform better than the Intel and TI-based cable modems with their poorly designed hardware processors. You cannot really know what is going on inside it. It could be poor software which could be fixed with a firmware update. It could be poorly designed hardware. It could be hardware which was designed for one set of requirements, but is now being pushed to handle much higher requirements than it is designed for and therefore performs poorly at those higher requirements in some cases. |
Send message Joined: 22 Mar 17 Posts: 55 Credit: 11,903,170 RAC: 66,644 |
That sounds like issues with a small NAT table that comes with the VZ Actiontec routers for FioS. Too many connections and it trips over itself as it doesn't have enough of a NAT table to keep up with the connections. Even if your ISP requires a certain router, there's nothing stopping you from putting your own router behind it. The only device the ISP router sees is your own router so the NAT table doesn't fill up. This is how I have mine setup. Just assign your router a different IP besides the typical 192.168.1.1 that might come with the ISP router. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
That would only work if you could put the ISP router in bridge mode. Otherwise, the ISP router will still keep track of all connections because it will see all of the different destinations and different source ports and still perform NAT on each connection. |
Send message Joined: 22 Mar 17 Posts: 55 Credit: 11,903,170 RAC: 66,644 |
That would only work if you could put the ISP router in bridge mode. Otherwise, the ISP router will still keep track of all connections because it will see all of the different destinations and different source ports and still perform NAT on each connection. I still don't have dropped connections with the VZ router not in bridge mode and I've added more connections since when I was using it as the main router. Shouldn't the VZ router only see one device, the aftermarket router since everything else is behind that NAT? |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
It will see one device, but every connection as separate. This is to allow the firewall to block incoming packets except to known connections that originated inside the firewall. It therefore has to keep track of every connection so it knows what packets to allow and route back inside, and to drop everything else. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
One more benefit I found is that since only one work unit could download at a time, the credential server at LHC@home never gets overwhelmed with requests and will always supply credentials. If too many work units start up at the same time, the credential server could only create credentials for a few of them, causing the rest of the work units starting up at the same time to fail. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
I should have done this earlier, but I finally found out why my old router was causing this trouble. Turning on IPv6 on the old router apparently disabled hardware acceleration since its hardware acceleration engine apparently only works on IPv4. Turning on IPv6 caused the issue since turning on IPv6 requires a sorting step to be added before any other processing can be done. Since the hardware engine was apparently designed for IPv4 only and apparently must be sent all incoming traffic to perform hardware-accelerated IPv4 without a sorting step, everything was then done in software when IPv6 is enabled. IPv6 is very simple to route in software even at gigabit wire speed, but performing NAT on IPv4 which is part of the routing process in a home IPv4 gateway is very expensive to perform in software, causing these problems. Replacing the router with a newer generation model which can accelerate the sorting and the routing of both IPv4 and IPv6 in hardware solved the problem. |
©2024 CERN