Message boards :
Number crunching :
Recommended CVMFS Configuration for Native Apps - Comments and Questions
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 1988 Credit: 143,072,249 RAC: 97,107 ![]() ![]() ![]() |
This is a discussion thread to post comments and questions regarding the CVMFS Configuration used by LHC@home: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594 A previous version of the HowTo and older comments can be found here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5342 |
![]() Send message Joined: 8 May 17 Posts: 12 Credit: 11,235,723 RAC: 0 ![]() ![]() |
Hello, My systems are all configured to use at least 4GB of local CVMFS cache and according to stats, they seem to reach pretty high hit ratios. Here is just a look at a couple of them: ![]() Would a proxy still help here ? Also are there specific Squid options one should set in regards to caching CVMFS data (ie. max object size, retention or refresh policy) ? |
![]() Send message Joined: 15 Jun 08 Posts: 1988 Credit: 143,072,249 RAC: 97,107 ![]() ![]() ![]() |
My systems ... seem to reach pretty high hit ratios. Yes. Here are 2 major reasons. Each CVMFS client does only serve tasks that are running on the same box (or inside the same VM). A single Squid serves all boxes and all VMs running at your site. Tasks like ATLAS or CMS heavily use CERN's Frontier system. Frontier requests data via HTTP but unlike CVMFS it has no own local cache. A local Squid closes this gap and serves most Frontier requests from it's cache. Also are there specific Squid options one should set in regards to caching CVMFS data (ie. max object size, retention or refresh policy) ? It's all covered by the squid.conf in the Squid HowTo. Some of Squid's original settings have been made decades ago and focus on surfing the web via slow connections. The suggestions in this forum extend the original settings and are based on experience and analysing the data flow created by LHC@home. Nonetheless, surfing arbitrary internet pages with this settings is still possible but since most of them use HTTPS the hitrates for them would drop to 0 %. Questions regarding specific Squid options should be asked in the Squid thread. |
Send message Joined: 2 May 07 Posts: 1545 Credit: 55,894,845 RAC: 192,338 ![]() ![]() ![]() |
|
Send message Joined: 15 Nov 14 Posts: 590 Credit: 21,873,661 RAC: 703 ![]() ![]() |
Very nice; thanks. But I think it should be pointed out that the automatic configuration download no longer applies, insofar as I can see. (sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local) Maybe it could be updated? |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 73 ![]() ![]() |
Very nice; thanks. I had this problem, as well. Probing immediately failed. Perhaps that file could be updated with the minimum needed configuration, although I'm still unclear how one can actually optimize their configuration if it is just 1 or 2 machines on the same connection. |
![]() Send message Joined: 15 Jun 08 Posts: 1988 Credit: 143,072,249 RAC: 97,107 ![]() ![]() ![]() |
Perhaps that file could be updated with the minimum needed configuration ... The file on the server is already up to date. Be aware that it includes 2 optional settings (with proxy/without proxy) and one of them has to be activated by the user. In general: Native apps require more settings to be done by the user. This is easier, faster and more reliable than to guess certain values. In addition some steps require to be done by root. although I'm still unclear how one can actually optimize their configuration The simple answer Cache as much as possible as close as possible to the point were it is used. To avoid less efficient effort focus on the major bottlenecks first. More LHC@home specific CVMFS is heavily used but has it's own cache - one cache instance per machine. A machine can't share it's CVMFS cache with other machines. Each VM counts as individual machine. Outdated or missing data is requested from the project servers. Frontier is heavily used by ATLAS and CMS. It has no own local cache. Each app sends all Frontier requests to the project servers. Cloudflare's openhtc.io infrastructure helps to distribute CVMFS and Frontier data. They run a very fast worldwide network and one of their proxy caches will most likely be located much closer to your clients than any project server. VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration. This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF". A local HTTP proxy closes the gap between openhtc.io and the local clients. It can cache data for all local CVMFS and Frontier clients as well as offload openhtc.io and the project servers. |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 73 ![]() ![]() |
Perhaps that file could be updated with the minimum needed configuration ... How does one go about cacheing as much as possible? Not sure what happened in my case, then, since as soon as I downloaded https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local I got immediate failures after probing. Running the listed items in the how to fixed my issues, and I believe I also added the line containing openhtc.io. Thank you for the help and excellent clarification. |
Send message Joined: 2 May 07 Posts: 1545 Credit: 55,894,845 RAC: 192,338 ![]() ![]() ![]() |
VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration. Release Notes from CVMFS-Documentation 2.7.5 Atlas-Applet in Windows is using CVMFS 2.6.3. |
![]() Send message Joined: 15 Jun 08 Posts: 1988 Credit: 143,072,249 RAC: 97,107 ![]() ![]() ![]() |
Atlas-Applet in Windows is using CVMFS 2.6.3. CVMFS_USE_CDN makes it easier to switch between the traditional CVMFS server list and the Cloudflare server list. Older setups had to configure this manually which is still possible. It's all fine as long as an application from this project uses Cloudflare servers. Even CMS VMs that use v2.4.4.0 work fine. Related to CVMFS_USE_CDN it's more important to use a recent cvmfs-config-default package than to upgrade the CVMFS client: http://ecsft.cern.ch/dist/cvmfs/cvmfs-config/ |
![]() Send message Joined: 12 Jun 18 Posts: 108 Credit: 37,970,761 RAC: 0 ![]() ![]() |
What does this mean??? cvmfs_config stat /usr/bin/cvmfs_config: line 907: cd: /cvmfs/atlas.cern.ch: Transport endpoint is not connected |
Send message Joined: 4 Jul 06 Posts: 3 Credit: 330,042 RAC: 1,050 ![]() ![]() ![]() |
I have an old native app setup. I did the steps in the Howto V2 today. It didn't set up the CDN until I also installed the latest CVMFS config. Also, will any caching proxy do? I set up an old copy of Polipo. |
![]() Send message Joined: 15 Jun 08 Posts: 1988 Credit: 143,072,249 RAC: 97,107 ![]() ![]() ![]() |
Found an ATLAS native log of a task that succeeded: [2022-06-09 19:17:48] Checking for CVMFS [2022-06-09 19:17:48] Probing /cvmfs/atlas.cern.ch... OK [2022-06-09 19:17:48] Probing /cvmfs/atlas-condb.cern.ch... OK [2022-06-09 19:17:48] Running cvmfs_config stat atlas.cern.ch [2022-06-09 19:17:48] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE [2022-06-09 19:17:48] 2.7.1.0 8420 286 49476 105303 0 78 2841033 4194305 786 65024 0 238387 99.9362 703 40 http://cvmfs-s1fnal.opensciencegrid.org/cvmfs/atlas.cern.ch http://127.0.0.1:8123 1 [2022-06-09 19:17:48] CVMFS is ok [2022-06-09 19:17:48] Efficiency of ATLAS tasks can be improved by the following measure(s): [2022-06-09 19:17:48] The CVMFS client on this computer should be configured to use Cloudflare's openhtc.io. [2022-06-09 19:17:48] Further information can be found at the LHC@home message board. Nonetheless, there are some points that should be changed: 1. "CVMFS_USE_CDN=yes" should be set in /etc/cvmfs/default.local. Prior to "cvmfs_config reload" you can show the new configuration if you run: cvmfs_config showconfig -s atlas.cern.ch |grep CVMFS_SERVER_URL It should return a list of "*.openhtc.io" servers instead of the stratum-ones. 2. 127.0.0.1 should not be used as proxy IP as it allows only processes from the same box to access that network IP. That's why CVMFS can contact a proxy on the same box but another box in your network can't, not even a VM on the proxy box. Instead, configure your proxy to listen to the LAN IP of the box (e.g. 192.168.x.y) and configure your clients to connect to the proxy via that IP. Also, will any caching proxy do? I set up an old copy of Polipo. In principle, yes. CVMFS talks HTTP, hence each HTTP proxy should be able to handle the requests. Your log mentioned above shows that a proxy is used for CVMFS. But: The Squid configuration given in this forum contains some efficiency settings, e.g. cache large files except ATLAS EVNT files, and I don't know if they can be used for other proxies. |
©2022 CERN