Message boards : Number crunching : Setting up a local squid cache for a home cluster
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 34323 - Posted: 9 Feb 2018, 9:05:14 UTC

If you have a small cluster at home, it should be possible to setup a local squid cache to reduce the external traffic. A few people have looked into this but we don't have any detailed instructions for anyone to follow. This thread is to experiment with this setup.
ID: 34323 · Report as offensive
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 34324 - Posted: 9 Feb 2018, 9:08:33 UTC - in response to Message 34323.  

Once a squid proxy has been configured, BOINC can be configured to use this by setting the Connect via HTTP proxy server option. By default squid uses port 3128 and if this port is used, CVMFS will also attempt to use this proxy.
ID: 34324 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34326 - Posted: 9 Feb 2018, 10:06:17 UTC - in response to Message 34324.  

A squid will help, definitely.

See some numbers from a test day mid of January (just before the database trouble).
I used CMS WUs as they cause much more HTTP requests than other VMs.


Downloads served by the proxy
TCP_MEM_HIT 1,017,914 requests 1.92 GB
TCP_HIT 1,516 requests 4.68 GB
TCP_REFRESH_UNMODIFIED 8,505 requests 63.26 MB

Downloads requested from lhc@home
TCP_MISS 1,363 requests 362.30 MB
TCP_REFRESH_MODIFIED 3,031 requests 11.54 MB

Result uploads to lhc@home
TCP_MISS__UPLOAD 2,037 requests 28.97 GB



All lhc@home VMs benefit from a squid BUT:
- not with squid's default config (as it is optimised for surfing)
- the boost still needs some PBR rules (Policy Based Routing) to force the VM's network traffic through the proxy


A draft description may be requested via PM.


Once a squid proxy has been configured, BOINC can be configured to use this by setting the Connect via HTTP proxy server option. By default squid uses port 3128 and if this port is used, CVMFS will also attempt to use this proxy.

@Laurence
Do you refer to the proxy configuration form of the BOINC client?
If so, my VMs don't use the proxy although it is configured there.
Other projects (including SixTrack/ATLAS) use the proxy at least to request/report work.
ID: 34326 · Report as offensive
m

Send message
Joined: 6 Sep 08
Posts: 110
Credit: 6,717,286
RAC: 904
Message 34327 - Posted: 9 Feb 2018, 10:44:47 UTC
Last modified: 9 Feb 2018, 10:46:56 UTC

I once used a Squid proxy running on an old Raspberry Pi., and can confirm that, although
BOINC will use it Ok, and this alone makes a difference, the VMs, by default, don't.
Unfortunately I didn't (and don't) have a USB hard drive and was using some USB sticks,
combined using LVM and they gave up after a couple of months before I managed to
make the VMs use the cache. The cache gets a lot of use. Maybe I was running ATLAS then.
Maybe it's time for another try and some better USB thingys.
ID: 34327 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34328 - Posted: 9 Feb 2018, 11:05:41 UTC - in response to Message 34327.  

Maybe it's time for another try ...

You may use the following (or similar) squid settings to reduce write access to your storage.
max_stale 53 days

memory_replacement_policy heap GDSF
maximum_object_size_in_memory 256 KB
cache_mem 192 MB

cache_replacement_policy heap LFUDA
maximum_object_size 6144 MB
cache_dir aufs /var/cache/squid 25000 16 64 min-size=15361
ID: 34328 · Report as offensive
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 34331 - Posted: 9 Feb 2018, 12:26:38 UTC - in response to Message 34326.  



@Laurence
Do you refer to the proxy configuration form of the BOINC client?
If so, my VMs don't use the proxy although it is configured there.

Yes, the VM should use the proxy. if not we need to investigate.
ID: 34331 · Report as offensive
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 406
Credit: 96,116,916
RAC: 1
Message 34332 - Posted: 9 Feb 2018, 12:29:55 UTC

I had used a Squid for Atlas long time ago, but I was disappointed of the results and have stopped it.

You will only get easily benefits from a squid-proxy if you set it as your default gateway. This was not possible for me, I had to redirect Atlas-Traffic to the Squid.

If you force BOINC to use a proxy-setting, this only works for up- and downloads that are initiated by the BOINC-Client itself. Traffic, that is originated from inside the VMs ignores the Proxy-Settings, as they use different ports then http(s).

I had to redirect the traffic with routing tables and was not really very happy with this.

Maybe, my Squid config wasn't perfect for use with BOINC but there was noone to teach me how to make it better.


Supporting BOINC, a great concept !
ID: 34332 · Report as offensive
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 406
Credit: 96,116,916
RAC: 1
Message 34333 - Posted: 9 Feb 2018, 12:31:39 UTC - in response to Message 34331.  



@Laurence
Do you refer to the proxy configuration form of the BOINC client?
If so, my VMs don't use the proxy although it is configured there.

Yes, the VM should use the proxy. if not we need to investigate.

IIRC, changing proxy-settings doesn't work for running VMs, only newly created VMs will use the changed settings


Supporting BOINC, a great concept !
ID: 34333 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34334 - Posted: 9 Feb 2018, 13:26:14 UTC

1.
As I already mentioned, the default squid configuration is not prepared to cover lhc@home's special needs.
Therefore I agree that it may not deliver the expected benefits.
This can be improved by only a few modifications to squid.conf.


2.
Redirecting all HTTP traffic from the VMs through the squid currently doesn't work.
The preferred solution would be to read out the BOINC client's proxy configuration and make the VMs use it (Laurence?)


3.
A workaround (at least on linux systems) for point 2. is to do a policy based routing for which I can provide a white paper (includes point 1.).
ID: 34334 · Report as offensive
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 34335 - Posted: 9 Feb 2018, 13:43:22 UTC - in response to Message 34334.  
Last modified: 9 Feb 2018, 13:43:53 UTC


2.Redirecting all HTTP traffic from the VMs through the squid currently doesn't work.
The preferred solution would be to read out the BOINC client's proxy configuration and make the VMs use it (Laurence?)


This should be implemented. If the VM detects a BOINC proxy has been configured with the default squid port, it will try to use it for CVMFS.

Edit: note that this will currently only work with Theory, LHCb and CMS
ID: 34335 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34336 - Posted: 9 Feb 2018, 14:03:13 UTC - in response to Message 34335.  


2.Redirecting all HTTP traffic from the VMs through the squid currently doesn't work.
The preferred solution would be to read out the BOINC client's proxy configuration and make the VMs use it (Laurence?)


This should be implemented. If the VM detects a BOINC proxy has been configured with the default squid port, it will try to use it for CVMFS.

It doesn't work although my clients are configured to use my local squid.

CMS configures this:
2018-02-08 20:59:44 (6164): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-02-08 20:59:44 (6164): Guest Log: 2.2.0.0 3454 0 21816 5435 14 1 1636005 10240001 2 65024 0 20 95 20792 23 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1


Theory configures this:
2018-02-08 12:32:11 (30171): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-02-08 12:32:11 (30171): Guest Log: 2.2.0.0 3364 0 19764 5434 14 1 479950 10240001 2 65024 0 20 95 20792 21 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1


LHCb:
Don't run it at the moment as most of the VMs run only 2 short jobs within a few minutes and then start a new VM from the scratch.


Edit: note that this will currently only work with Theory, LHCb and CMS

ATLAS: works perfect with policy based routing.

CMS: Beside CVMFS it uses cmsfrontier.cern.ch:8000 via the local proxy. This is where the huge number of requests comes from.
ID: 34336 · Report as offensive
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,532,884
RAC: 1
Message 34349 - Posted: 9 Feb 2018, 22:09:40 UTC

would a local squid proxy also enhance the efficiency of native atlas tasks (if it is supported by the native app)?
Or is the cvmfs cache sufficient and the benefits (if there are some) of an additional squid could be neglected?
ID: 34349 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34363 - Posted: 10 Feb 2018, 21:38:07 UTC - in response to Message 34349.  

would a local squid proxy also enhance the efficiency of native atlas tasks (if it is supported by the native app)?
Or is the cvmfs cache sufficient and the benefits (if there are some) of an additional squid could be neglected?

All CERN subprojects (except SixTrack) use a local CVMFS instance.
Unfortunately their caches get lost when their VMs are removed.
ATLAS native preserves the cache on the local filesystem but this cache can only be used by other ATLAS native tasks on the same host.

A dedicated HTTP proxy, e.g. squid, can be used as a parent cache by all CVMFS instances on all hosts.
In addition that proxy can also be used by other data distribution systems, e.g. the frontier cache system, which is heavily used by CMS.
ID: 34363 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34442 - Posted: 21 Feb 2018, 10:27:10 UTC

At the moment only ATLAS VMs report that they use the proxy that is configured in the BOINC client.
CMS, LHCb and Theory seem to ignore the settings although they are listed in slots/n/init_data.xml.

To make the proxy available inside the VM would be a breakthrough regarding the use of CVMFS as it would avoid IP packet routing via iptables.
CMS may need additional measures as it requests lots of data via cmsfrontier.cern.ch and I don't know if this is as easy to configure as CVMFS.
ID: 34442 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34467 - Posted: 23 Feb 2018, 13:44:02 UTC

host1 - - [23/Feb/2018:09:58:48 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580401 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_MISS:HIER_DIRECT
host2 - - [23/Feb/2018:12:30:02 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT
host3 - - [23/Feb/2018:12:53:26 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT
host4 - - [23/Feb/2018:13:17:59 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//CMS_2016_10_31.vdi.gz HTTP/1.1" 200 665580407 "-" "BOINC client (x86_64-pc-linux-gnu 7.8.4)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT

1x downloaded, 3x served from the cache.
This is one of the reasons why I like squid, even with a fast flatrate.
:-)
ID: 34467 · Report as offensive
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,390,885
RAC: 10,750
Message 34468 - Posted: 23 Feb 2018, 13:54:22 UTC - in response to Message 34467.  
Last modified: 23 Feb 2018, 13:56:40 UTC

Oh, yes! It can be so very economical. I'm not using it myself but for some people it can make a big difference! (Actually, here on campus we have a caching proxy anyhow for http; at home I only have one LHC@Home instance running.)
ID: 34468 · Report as offensive
Jim1348

Send message
Joined: 15 Nov 14
Posts: 440
Credit: 12,003,590
RAC: 4,910
Message 34469 - Posted: 23 Feb 2018, 14:30:15 UTC - in response to Message 34363.  

A dedicated HTTP proxy, e.g. squid, can be used as a parent cache by all CVMFS instances on all hosts.
In addition that proxy can also be used by other data distribution systems, e.g. the frontier cache system, which is heavily used by CMS.

Do you need a separate machine for the squid proxy? I usually run only one machine on LHC, or two at most. Can I place the squid on the machine where LHC is running?
ID: 34469 · Report as offensive
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1449
Credit: 77,190,260
RAC: 95,375
Message 34471 - Posted: 23 Feb 2018, 14:52:01 UTC - in response to Message 34469.  

Can I place the squid on the machine where LHC is running?

In principle, yes. It is possible to run it on the same machine.
A squid that is configured to serve your BOINC projects runs well with 128 MB RAM and uses only a few % CPU.
Most limiting would be a very old and slow harddisk although disk access can nearly be avoided by a good setup.
ID: 34471 · Report as offensive
Jim1348

Send message
Joined: 15 Nov 14
Posts: 440
Credit: 12,003,590
RAC: 4,910
Message 34473 - Posted: 23 Feb 2018, 16:13:17 UTC - in response to Message 34471.  

A squid that is configured to serve your BOINC projects runs well with 128 MB RAM and uses only a few % CPU.
Most limiting would be a very old and slow harddisk although disk access can nearly be avoided by a good setup.

Not a problem. I have 32 GB, and devote 12 GB to a write cache, and that is in front of a Samsung 850 EVO. The Ryzen 1700 (which works better on LHC than my Haswell machines by the way) has cycles to spare. I would like to put them to good use if it would help the project.

On the other hand, I have a fast cable modem (50/10 Mbps) and don't know if I really need a squid. Your advice would be helpful.
ID: 34473 · Report as offensive
m

Send message
Joined: 6 Sep 08
Posts: 110
Credit: 6,717,286
RAC: 904
Message 34810 - Posted: 30 Mar 2018, 13:30:46 UTC - in response to Message 34335.  


This should be implemented. If the VM detects a BOINC proxy has been configured with the default squid port, it will try to use it for CVMFS.

Edit: note that this will currently only work with Theory, LHCb and CMS

Sometimes it works...

2018-03-07 22:37:47 (18466): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2018-03-07 22:37:48 (18466): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128
2018-03-07 22:38:52 (18466): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80

(...time passes...)

2018-03-07 22:38:55 (18466): Guest Log: [DEBUG] Probing CVMFS ...
2018-03-07 22:38:55 (18466): Guest Log: Probing /cvmfs/grid.cern.ch... OK
2018-03-07 22:38:58 (18466): Guest Log: Probing /cvmfs/sft.cern.ch... OK
2018-03-07 22:38:58 (18466): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-03-07 22:38:58 (18466): Guest Log: 2.2.0.0 3429 1 20276 5613 3 1 392693 10240001 2 65024 0 15 100 0 0 http://cernvmfs.gridpp.rl.ac.uk/cvmfs/grid.cern.ch http://192.168.100.137:3128 1
2018-03-07 22:39:06 (18466): Guest Log: [INFO] Reading volunteer information
2018-03-07 22:39:06 (18466): Guest Log: [INFO] Volunteer: m (178) Host: 1422

and sometimes it doesn't...

2018-03-07 22:57:36 (19112): Guest Log: [INFO] Mounting the shared directory
2018-03-07 22:57:36 (19112): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2018-03-07 22:57:36 (19112): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128
2018-03-07 22:58:51 (19112): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2018-03-07 22:58:52 (19112): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!

(... more time goes by...)

2018-03-07 22:58:53 (19112): Guest Log: [DEBUG] Probing CVMFS ...
2018-03-07 22:58:54 (19112): Guest Log: Probing /cvmfs/grid.cern.ch... OK
2018-03-07 22:58:58 (19112): Guest Log: Probing /cvmfs/sft.cern.ch... OK
2018-03-07 22:58:58 (19112): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-03-07 22:58:58 (19112): Guest Log: 2.2.0.0 3422 1 22264 5613 3 1 392693 10240001 2 65024 0 15 100 0 0 http://cernvmfs.gridpp.rl.ac.uk/cvmfs/grid.cern.ch DIRECT 1
2018-03-07 22:59:04 (19112): Guest Log: [INFO] Reading volunteer information
2018-03-07 22:59:04 (19112): Guest Log: [INFO] Volunteer: m (178) Host: 1422

Clearly, these logs ar a bit old, but the problem remains. At the moment everything is beavering away running sixtrack so there aren't enough VM tasks to get any idea of the success rate.
ID: 34810 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Setting up a local squid cache for a home cluster


©2020 CERN