Message boards :
Number crunching :
Setting up a local squid cache for a home cluster - old comments
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 ![]() ![]() |
If you have a small cluster at home, it should be possible to setup a local squid cache to reduce the external traffic. A few people have looked into this but we don't have any detailed instructions for anyone to follow. This thread is to experiment with this setup. |
![]() Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 ![]() ![]() |
Once a squid proxy has been configured, BOINC can be configured to use this by setting the Connect via HTTP proxy server option. By default squid uses port 3128 and if this port is used, CVMFS will also attempt to use this proxy. |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
A squid will help, definitely. See some numbers from a test day mid of January (just before the database trouble). I used CMS WUs as they cause much more HTTP requests than other VMs. Downloads served by the proxy TCP_MEM_HIT 1,017,914 requests 1.92 GB TCP_HIT 1,516 requests 4.68 GB TCP_REFRESH_UNMODIFIED 8,505 requests 63.26 MB Downloads requested from lhc@home TCP_MISS 1,363 requests 362.30 MB TCP_REFRESH_MODIFIED 3,031 requests 11.54 MB Result uploads to lhc@home TCP_MISS__UPLOAD 2,037 requests 28.97 GB All lhc@home VMs benefit from a squid BUT: - not with squid's default config (as it is optimised for surfing) - the boost still needs some PBR rules (Policy Based Routing) to force the VM's network traffic through the proxy A draft description may be requested via PM. Once a squid proxy has been configured, BOINC can be configured to use this by setting the Connect via HTTP proxy server option. By default squid uses port 3128 and if this port is used, CVMFS will also attempt to use this proxy. @Laurence Do you refer to the proxy configuration form of the BOINC client? If so, my VMs don't use the proxy although it is configured there. Other projects (including SixTrack/ATLAS) use the proxy at least to request/report work. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,880,255 RAC: 4,091 ![]() ![]() ![]() |
I once used a Squid proxy running on an old Raspberry Pi., and can confirm that, although BOINC will use it Ok, and this alone makes a difference, the VMs, by default, don't. Unfortunately I didn't (and don't) have a USB hard drive and was using some USB sticks, combined using LVM and they gave up after a couple of months before I managed to make the VMs use the cache. The cache gets a lot of use. Maybe I was running ATLAS then. Maybe it's time for another try and some better USB thingys. |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
Maybe it's time for another try ... You may use the following (or similar) squid settings to reduce write access to your storage. max_stale 53 days memory_replacement_policy heap GDSF maximum_object_size_in_memory 256 KB cache_mem 192 MB cache_replacement_policy heap LFUDA maximum_object_size 6144 MB cache_dir aufs /var/cache/squid 25000 16 64 min-size=15361 |
![]() Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 ![]() ![]() |
Yes, the VM should use the proxy. if not we need to investigate. |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 455 Credit: 209,163,573 RAC: 82,420 ![]() ![]() ![]() |
I had used a Squid for Atlas long time ago, but I was disappointed of the results and have stopped it. You will only get easily benefits from a squid-proxy if you set it as your default gateway. This was not possible for me, I had to redirect Atlas-Traffic to the Squid. If you force BOINC to use a proxy-setting, this only works for up- and downloads that are initiated by the BOINC-Client itself. Traffic, that is originated from inside the VMs ignores the Proxy-Settings, as they use different ports then http(s). I had to redirect the traffic with routing tables and was not really very happy with this. Maybe, my Squid config wasn't perfect for use with BOINC but there was noone to teach me how to make it better. ![]() Supporting BOINC, a great concept ! |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 455 Credit: 209,163,573 RAC: 82,420 ![]() ![]() ![]() |
IIRC, changing proxy-settings doesn't work for running VMs, only newly created VMs will use the changed settings ![]() Supporting BOINC, a great concept ! |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
1. As I already mentioned, the default squid configuration is not prepared to cover lhc@home's special needs. Therefore I agree that it may not deliver the expected benefits. This can be improved by only a few modifications to squid.conf. 2. Redirecting all HTTP traffic from the VMs through the squid currently doesn't work. The preferred solution would be to read out the BOINC client's proxy configuration and make the VMs use it (Laurence?) 3. A workaround (at least on linux systems) for point 2. is to do a policy based routing for which I can provide a white paper (includes point 1.). |
![]() Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 ![]() ![]() |
This should be implemented. If the VM detects a BOINC proxy has been configured with the default squid port, it will try to use it for CVMFS. Edit: note that this will currently only work with Theory, LHCb and CMS |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
It doesn't work although my clients are configured to use my local squid. CMS configures this: 2018-02-08 20:59:44 (6164): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-02-08 20:59:44 (6164): Guest Log: 2.2.0.0 3454 0 21816 5435 14 1 1636005 10240001 2 65024 0 20 95 20792 23 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1 Theory configures this: 2018-02-08 12:32:11 (30171): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-02-08 12:32:11 (30171): Guest Log: 2.2.0.0 3364 0 19764 5434 14 1 479950 10240001 2 65024 0 20 95 20792 21 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1 LHCb: Don't run it at the moment as most of the VMs run only 2 short jobs within a few minutes and then start a new VM from the scratch. Edit: note that this will currently only work with Theory, LHCb and CMS ATLAS: works perfect with policy based routing. CMS: Beside CVMFS it uses cmsfrontier.cern.ch:8000 via the local proxy. This is where the huge number of requests comes from. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
would a local squid proxy also enhance the efficiency of native atlas tasks (if it is supported by the native app)? Or is the cvmfs cache sufficient and the benefits (if there are some) of an additional squid could be neglected? |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
would a local squid proxy also enhance the efficiency of native atlas tasks (if it is supported by the native app)? All CERN subprojects (except SixTrack) use a local CVMFS instance. Unfortunately their caches get lost when their VMs are removed. ATLAS native preserves the cache on the local filesystem but this cache can only be used by other ATLAS native tasks on the same host. A dedicated HTTP proxy, e.g. squid, can be used as a parent cache by all CVMFS instances on all hosts. In addition that proxy can also be used by other data distribution systems, e.g. the frontier cache system, which is heavily used by CMS. |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
At the moment only ATLAS VMs report that they use the proxy that is configured in the BOINC client. CMS, LHCb and Theory seem to ignore the settings although they are listed in slots/n/init_data.xml. To make the proxy available inside the VM would be a breakthrough regarding the use of CVMFS as it would avoid IP packet routing via iptables. CMS may need additional measures as it requests lots of data via cmsfrontier.cern.ch and I don't know if this is as easy to configure as CVMFS. |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
host1 - - [23/Feb/2018:09:58:48 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580401 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_MISS:HIER_DIRECT host2 - - [23/Feb/2018:12:30:02 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT host3 - - [23/Feb/2018:12:53:26 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT host4 - - [23/Feb/2018:13:17:59 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT 1x downloaded, 3x served from the cache. This is one of the reasons why I like squid, even with a fast flatrate. :-) |
![]() Send message Joined: 29 Aug 05 Posts: 1072 Credit: 8,419,843 RAC: 6,406 ![]() |
|
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
A dedicated HTTP proxy, e.g. squid, can be used as a parent cache by all CVMFS instances on all hosts. Do you need a separate machine for the squid proxy? I usually run only one machine on LHC, or two at most. Can I place the squid on the machine where LHC is running? |
![]() Send message Joined: 15 Jun 08 Posts: 2606 Credit: 262,347,745 RAC: 135,699 ![]() ![]() |
Can I place the squid on the machine where LHC is running? In principle, yes. It is possible to run it on the same machine. A squid that is configured to serve your BOINC projects runs well with 128 MB RAM and uses only a few % CPU. Most limiting would be a very old and slow harddisk although disk access can nearly be avoided by a good setup. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
A squid that is configured to serve your BOINC projects runs well with 128 MB RAM and uses only a few % CPU. Not a problem. I have 32 GB, and devote 12 GB to a write cache, and that is in front of a Samsung 850 EVO. The Ryzen 1700 (which works better on LHC than my Haswell machines by the way) has cycles to spare. I would like to put them to good use if it would help the project. On the other hand, I have a fast cable modem (50/10 Mbps) and don't know if I really need a squid. Your advice would be helpful. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,880,255 RAC: 4,091 ![]() ![]() ![]() |
Sometimes it works... 2018-03-07 22:37:47 (18466): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2018-03-07 22:37:48 (18466): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128 2018-03-07 22:38:52 (18466): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80 (...time passes...) 2018-03-07 22:38:55 (18466): Guest Log: [DEBUG] Probing CVMFS ... 2018-03-07 22:38:55 (18466): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-03-07 22:38:58 (18466): Guest Log: Probing /cvmfs/sft.cern.ch... OK 2018-03-07 22:38:58 (18466): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-03-07 22:38:58 (18466): Guest Log: 2.2.0.0 3429 1 20276 5613 3 1 392693 10240001 2 65024 0 15 100 0 0 http://cernvmfs.gridpp.rl.ac.uk/cvmfs/grid.cern.ch http://192.168.100.137:3128 1 2018-03-07 22:39:06 (18466): Guest Log: [INFO] Reading volunteer information 2018-03-07 22:39:06 (18466): Guest Log: [INFO] Volunteer: m (178) Host: 1422 and sometimes it doesn't... 2018-03-07 22:57:36 (19112): Guest Log: [INFO] Mounting the shared directory 2018-03-07 22:57:36 (19112): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2018-03-07 22:57:36 (19112): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128 2018-03-07 22:58:51 (19112): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80 2018-03-07 22:58:52 (19112): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded! (... more time goes by...) 2018-03-07 22:58:53 (19112): Guest Log: [DEBUG] Probing CVMFS ... 2018-03-07 22:58:54 (19112): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-03-07 22:58:58 (19112): Guest Log: Probing /cvmfs/sft.cern.ch... OK 2018-03-07 22:58:58 (19112): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-03-07 22:58:58 (19112): Guest Log: 2.2.0.0 3422 1 22264 5613 3 1 392693 10240001 2 65024 0 15 100 0 0 http://cernvmfs.gridpp.rl.ac.uk/cvmfs/grid.cern.ch DIRECT 1 2018-03-07 22:59:04 (19112): Guest Log: [INFO] Reading volunteer information 2018-03-07 22:59:04 (19112): Guest Log: [INFO] Volunteer: m (178) Host: 1422 Clearly, these logs ar a bit old, but the problem remains. At the moment everything is beavering away running sixtrack so there aren't enough VM tasks to get any idea of the success rate. |
©2025 CERN