Message boards : ATLAS application : Use existing Tier-2 cvmfs squid for boinc ATLAS@home hosts
Message board moderation

To post messages, you must log in.

AuthorMessage
MPI für Physik

Send message
Joined: 20 Mar 15
Posts: 7
Credit: 310,850,648
RAC: 437,777
Message 44325 - Posted: 17 Feb 2021, 7:37:43 UTC

Hi

before we ramp up our pool of boinc hosts for ATLAS we would like to set up the cvmfs squid.

I did have a look at the threads dealing with this, but they are not so helpful, since these cover setting up your own squid instance. We do already have a cvmfs squid for our local WLCG Tier2 cluster and the obvious plan is to make our boinc hosts connect to it.

Does anybody have a working configuration for this scenario and could share it?

Thanks a lot, Stefan
ID: 44325 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 147
Credit: 14,665,277
RAC: 0
Message 44328 - Posted: 17 Feb 2021, 17:08:45 UTC - in response to Message 44325.  

We do already have a cvmfs squid for our local WLCG Tier2 cluster and the obvious plan is to make our boinc hosts connect to it.
Does anybody have a working configuration for this scenario and could share it?

Years ago, for Atlas-native I simply used the same basic configuration of CVMFS on the boinc nodes as on the Grid cluster nodes.
As you're going with VirtualBox I have no idea how it will work out: defining a proxy in BOINC will route all web traffic through it, including the VM's Frontier traffic as well as BOINC job requests. Also, AIUI the Atlas VMs here have the CVMFS preconfigured to use the Cloudflare stratum one server instead of the CERN ones in which case I don't think you'll get the expected efficiency either, as the squid would end up having to hold two copies of each file. My inclination would be to just run a BOINC-specific squid on each machine to hold the local CVMFS cache across successive VM instances.
ID: 44328 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1818
Credit: 122,900,917
RAC: 76,440
Message 44329 - Posted: 17 Feb 2021, 18:36:23 UTC - in response to Message 44328.  

As you're going with VirtualBox ... defining a proxy in BOINC will route all web traffic through it, including the VM's Frontier traffic as well as BOINC job requests.

Right.

... the Atlas VMs here have the CVMFS preconfigured to use the Cloudflare stratum one server instead of the CERN ones ...

Right.


... in which case I don't think you'll get the expected efficiency either, as the squid would end up having to hold two copies of each file.

In fact it happens (fail-over, multiple server aliases...) regarding CVMFS.
Regarding Frontier only in case of fail-over.
Even then a local proxy will be more efficient than a direct connection.


My inclination would be to just run a BOINC-specific squid on each machine to hold the local CVMFS cache across successive VM instances.

One single squid instance is powerful and efficient enough to serve several 100 worker nodes as long as there's no network bottleneck between squid and the worker nodes.
A separate squid instance on each worker box will reduce efficiency and increase maintenance effort.
ID: 44329 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 687
Credit: 433,603,600
RAC: 60,297
Message 44330 - Posted: 17 Feb 2021, 20:33:22 UTC

I run just one computer with squid and it seems fine as computezrmle said, I only see upto 50 MB/s of network traffic so even the net requirements aren't that bad.
ID: 44330 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 147
Credit: 14,665,277
RAC: 0
Message 44333 - Posted: 17 Feb 2021, 22:01:04 UTC - in response to Message 44329.  

A separate squid instance on each worker box will reduce efficiency and increase maintenance effort.

As the OP claims to be close to a WLCG Tier2 site I figured network bandwidth wouldn't be an issue, and if there's enough machines that maintenance issues are significant then you want to be using a configuration management system anyway in which case I found keeping machines identical made things simpler! And on a corporate/enterprise network, a separate squid is likely to result in questions from network admins about what it's doing there, who can connect through it, and to where...

For CVMFS the biggest single gain is in maintaining the cache across successive VM initialisations; the local squid route gets that without affecting any other activity on the machine, and without exposing new services on the network.
If I was in the OP's situation - and I've been close - I'd start there. Once the OP has some experience and a track record with their networking team they can always take another step, whether that's the standalone BOINC squid or fudgerating the local squids to use the Tier2 one or whatever else. (It'll also depend on what else the machine is used for - the OP mentioned a separate batch system in another post.)
ID: 44333 · Report as offensive     Reply Quote

Message boards : ATLAS application : Use existing Tier-2 cvmfs squid for boinc ATLAS@home hosts


©2021 CERN