Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1723
Credit: 108,872,486
RAC: 95,434
Message 44233 - Posted: 30 Jan 2021, 16:33:24 UTC

This is a discussion thread to post comments and questions regarding the CVMFS Configuration used by LHC@home:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594

A previous version of the HowTo and older comments can be found here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5342
ID: 44233 · Report as offensive     Reply Quote
[AF] Hydrosaure
Avatar

Send message
Joined: 8 May 17
Posts: 10
Credit: 10,018,784
RAC: 10,665
Message 44240 - Posted: 31 Jan 2021, 9:13:53 UTC - in response to Message 44233.  

Hello,

My systems are all configured to use at least 4GB of local CVMFS cache and according to stats, they seem to reach pretty high hit ratios.
Here is just a look at a couple of them:


Would a proxy still help here ?

Also are there specific Squid options one should set in regards to caching CVMFS data (ie. max object size, retention or refresh policy) ?
ID: 44240 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1723
Credit: 108,872,486
RAC: 95,434
Message 44242 - Posted: 31 Jan 2021, 10:09:33 UTC - in response to Message 44240.  

My systems ... seem to reach pretty high hit ratios.
Would a proxy still help here ?

Yes.
Here are 2 major reasons.

Each CVMFS client does only serve tasks that are running on the same box (or inside the same VM).
A single Squid serves all boxes and all VMs running at your site.


Tasks like ATLAS or CMS heavily use CERN's Frontier system.
Frontier requests data via HTTP but unlike CVMFS it has no own local cache.
A local Squid closes this gap and serves most Frontier requests from it's cache.



Also are there specific Squid options one should set in regards to caching CVMFS data (ie. max object size, retention or refresh policy) ?

It's all covered by the squid.conf in the Squid HowTo.

Some of Squid's original settings have been made decades ago and focus on surfing the web via slow connections.
The suggestions in this forum extend the original settings and are based on experience and analysing the data flow created by LHC@home.
Nonetheless, surfing arbitrary internet pages with this settings is still possible but since most of them use HTTPS the hitrates for them would drop to 0 %.
Questions regarding specific Squid options should be asked in the Squid thread.
ID: 44242 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1168
Credit: 37,660,756
RAC: 5,084
Message 44247 - Posted: 1 Feb 2021, 1:26:38 UTC
Last modified: 1 Feb 2021, 1:34:04 UTC

ID: 44247 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 527
Credit: 15,466,625
RAC: 1,324
Message 44268 - Posted: 4 Feb 2021, 22:09:55 UTC - in response to Message 44233.  

Very nice; thanks.
But I think it should be pointed out that the automatic configuration download no longer applies, insofar as I can see.
(sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local)

Maybe it could be updated?
ID: 44268 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 30
Credit: 702,878
RAC: 0
Message 44404 - Posted: 27 Feb 2021, 7:08:14 UTC - in response to Message 44268.  

Very nice; thanks.
But I think it should be pointed out that the automatic configuration download no longer applies, insofar as I can see.
(sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local)

Maybe it could be updated?

I had this problem, as well. Probing immediately failed.
Perhaps that file could be updated with the minimum needed configuration, although I'm still unclear how one can actually optimize their configuration if it is just 1 or 2 machines on the same connection.
ID: 44404 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1723
Credit: 108,872,486
RAC: 95,434
Message 44405 - Posted: 27 Feb 2021, 8:51:57 UTC - in response to Message 44404.  

Perhaps that file could be updated with the minimum needed configuration ...

The file on the server is already up to date.
Be aware that it includes 2 optional settings (with proxy/without proxy) and one of them has to be activated by the user.

In general:
Native apps require more settings to be done by the user.
This is easier, faster and more reliable than to guess certain values.
In addition some steps require to be done by root.



although I'm still unclear how one can actually optimize their configuration

The simple answer

Cache as much as possible as close as possible to the point were it is used.
To avoid less efficient effort focus on the major bottlenecks first.


More LHC@home specific

CVMFS is heavily used but has it's own cache - one cache instance per machine.
A machine can't share it's CVMFS cache with other machines.
Each VM counts as individual machine.
Outdated or missing data is requested from the project servers.

Frontier is heavily used by ATLAS and CMS. It has no own local cache.
Each app sends all Frontier requests to the project servers.


Cloudflare's openhtc.io infrastructure helps to distribute CVMFS and Frontier data.
They run a very fast worldwide network and one of their proxy caches will most likely be located much closer to your clients than any project server.

VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration.
This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF".


A local HTTP proxy closes the gap between openhtc.io and the local clients.
It can cache data for all local CVMFS and Frontier clients as well as offload openhtc.io and the project servers.
ID: 44405 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 30
Credit: 702,878
RAC: 0
Message 44414 - Posted: 27 Feb 2021, 20:36:31 UTC - in response to Message 44405.  

Perhaps that file could be updated with the minimum needed configuration ...

The file on the server is already up to date.
Be aware that it includes 2 optional settings (with proxy/without proxy) and one of them has to be activated by the user.

In general:
Native apps require more settings to be done by the user.
This is easier, faster and more reliable than to guess certain values.
In addition some steps require to be done by root.



although I'm still unclear how one can actually optimize their configuration

The simple answer

Cache as much as possible as close as possible to the point were it is used.
To avoid less efficient effort focus on the major bottlenecks first.


More LHC@home specific

CVMFS is heavily used but has it's own cache - one cache instance per machine.
A machine can't share it's CVMFS cache with other machines.
Each VM counts as individual machine.
Outdated or missing data is requested from the project servers.

Frontier is heavily used by ATLAS and CMS. It has no own local cache.
Each app sends all Frontier requests to the project servers.


Cloudflare's openhtc.io infrastructure helps to distribute CVMFS and Frontier data.
They run a very fast worldwide network and one of their proxy caches will most likely be located much closer to your clients than any project server.

VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration.
This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF".


A local HTTP proxy closes the gap between openhtc.io and the local clients.
It can cache data for all local CVMFS and Frontier clients as well as offload openhtc.io and the project servers.

How does one go about cacheing as much as possible?
Not sure what happened in my case, then, since as soon as I downloaded
https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local
I got immediate failures after probing.
Running the listed items in the how to fixed my issues, and I believe I also added the line containing openhtc.io.

Thank you for the help and excellent clarification.
ID: 44414 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1168
Credit: 37,660,756
RAC: 5,084
Message 44417 - Posted: 28 Feb 2021, 9:36:13 UTC - in response to Message 44405.  

VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration.
This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF".

Release Notes from CVMFS-Documentation 2.7.5
2.1 Release Notes for CernVM-FS 2.7.5CernVM-FS 2.7.5 is a patch release.
It contains several bugfixes for the client.As with previous releases, upgrading clients should be seamless just by installing the new package from therepository. As usual, we recommend to update only a few worker nodes first and gradually ramp up once the newversion proves to work correctly.
Please take special care when upgrading a cvmfs client in NFS mode.Stratum 0 and stratum 1 servers do not necessarily need to update from version 2.7.4.2.1.1
Bug Fixes and Improvements•
[client] fix rare crash when kernel meta-data caches operate close to 4GB (CVM-1918)•
[client] let mount helper detect when CVMFS_HTTP_PROXY is defined but empty•
[client] add CVMFS_CLIENT_PROFILE and CVMFS_USE_CDN to the list of known parameters in cvmfs_config

Atlas-Applet in Windows is using CVMFS 2.6.3.
ID: 44417 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1723
Credit: 108,872,486
RAC: 95,434
Message 44418 - Posted: 28 Feb 2021, 14:08:31 UTC - in response to Message 44417.  

Atlas-Applet in Windows is using CVMFS 2.6.3.

CVMFS_USE_CDN makes it easier to switch between the traditional CVMFS server list and the Cloudflare server list.
Older setups had to configure this manually which is still possible.
It's all fine as long as an application from this project uses Cloudflare servers.

Even CMS VMs that use v2.4.4.0 work fine.
Related to CVMFS_USE_CDN it's more important to use a recent cvmfs-config-default package than to upgrade the CVMFS client:
http://ecsft.cern.ch/dist/cvmfs/cvmfs-config/
ID: 44418 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 108
Credit: 37,970,761
RAC: 0
Message 44809 - Posted: 24 Apr 2021, 20:05:50 UTC

What does this mean???
cvmfs_config stat
/usr/bin/cvmfs_config: line 907: cd: /cvmfs/atlas.cern.ch: Transport endpoint is not connected
ID: 44809 · Report as offensive     Reply Quote

Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions


©2021 CERN