Message boards :
Number crunching :
Setting up a local Squid to work with LHC@home - Comments and Questions
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
Typical Squid statistics after 34 days runtime. Taken from "squidclient mgr:info" # Squid is configured to use up to 256 MB RAM cache and up to 32 GB disk cache. # Squid is set to limit disk/RAM usage at 90% based on the configured upper limits. ****************************************** Storage Swap size: 29491068 KB Storage Swap capacity: 90.0% used, 10.0% free 23812 on-disk objects ****************************************** ****************************************** # RAM is still not completely used Storage Mem size: 213976 KB Storage Mem capacity: 81.6% used, 18.4% free 36712 StoreEntries with MemObjects ****************************************** Those objects (=files) are used over and over again. => from "squidclient mgr:refresh" Protocol #Calls %Calls HTTP 18716280 95.78 On Store 824463 4.22 "HTTP" means: access to RAM cache "On Store" means: access to disk HTTP histogram (partly): Count %Total Category 16669106 89.06 Fresh: expires time not reached 2043554 10.92 Stale: expires time reached On Store histogram (partly): Count %Total Category 575087 69.75 Fresh: expires time not reached 246319 29.88 Stale: expires time reached 3057 0.37 Stale: refresh_pattern max age rule Average CPU Usage: 3.61% |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
Looks OK on one I checked: 2020-12-10 23:55:27 (4424): Guest Log: [INFO] Detected local proxy http://192.168.1.134:3128 in init_data.xml 2020-12-10 23:55:27 (4424): Guest Log: [INFO] Testing connection to 192.168.1.134 on port 3128 2020-12-10 23:55:27 (4424): Guest Log: [INFO] Ncat: Version 7.50 ( https://nmap.org/ncat ) 2020-12-10 23:55:27 (4424): Guest Log: Ncat: Connected to 192.168.1.134:3128. .... 2020-12-10 23:55:35 (4424): Guest Log: 2.5.2.0 4109 0 29640 19355 3 1 264845 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://192.168.1.134:3128 1 2020-12-10 23:55:36 (4424): Guest Log: Probing /cvmfs/grid.cern.ch... OK |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
I rebuilt your script for parsing the logs in powershell, I don't seen any upload data? Downloads served by the proxy TCP_MEM_HIT 80268 requests 402.8 MB TCP_HIT 12131 requests 12.6 GB TCP_REFRESH_UNMODIFIED 0 requests 0 B Downloads requested from lhc@home TCP_MISS 8154 requests 1.1 GB TCP_REFRESH_MODIFIED 1452 requests 176.2 MB Result uploads to lhc@home TCP_MISS__UPLOAD 0 requests 0 B From Squid after a few hours: Storage Swap size: 871816 KB Storage Swap capacity: 4.3% used, 95.7% free 1686 on-disk objects Storage Mem size: 40988 KB Storage Mem capacity: 15.6% used, 84.4% free 7059 StoreEntries with MemObjects HTTP 93927 90.98 On Store 9308 9.02 92460 98.44 Fresh: expires time not reached 1467 1.56 Stale: expires time reached 8175 87.83 Fresh: expires time not reached 1131 12.15 Stale: expires time reached 2 0.02 Stale: refresh_pattern max age rule CPU Usage, 60 minute avg: 0.30% |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
Looks OK on one I checked: Perfect! :-) To avoid confusion: Logs from different subprojects have slightly different entries because the scripts are made by other programmers. The output pattern of "cvmfs_config stat ..." always looks like this: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2.5.2.0 4109 0 29640 19355 3 1 264845 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://192.168.1.134:3128 1 |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
Result uploads to lhc@home Reading/Analysing squid's access.log You will never find TCP_MISS__UPLOAD in squid's access.log since this is a term I use in my own filter scripts to combine a couple of different fields. General rule: Uploads can't be cached, hence are never a TCP_MEM_HIT/TCP_HIT/TCP_REFRESH... . Like most other BOINC projects Theory and Sixtrack download/upload from/to the project server via the BOINC client. Nowadays mostly with HTTPS, which looks like this: cruncher1 3128 - - [11/Dec/2020:00:12:09 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 25666 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_TUNNEL:HIER_DIRECT From that line you can't see whether it was a download/upload or whether it was Theory/SixTrack. Theory also does some transfers to/from lhcathome-upload.cern.ch but unlike ATLAS' transfers they are rather small. ATLAS gets large EVNT files from lhcathome-upload.cern.ch. Those files are unique, hence will always be a TCP_MISS. In addition squid is configured not to cache them as lots of them would quickly fill the disk cache. cruncher1 3128 - - [11/Dec/2020:00:05:43 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//322/EFILDmMLP7xnsSi4apGgGQJmABFKDmABFKDmMPBMDmABFKDmvwZSWo_EVNT.22950603._000764.pool.root.1 HTTP/1.1" 200 400963022 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT "POST" shows this is an upload. In this case an ATLAS HITS file >160 MB. Squid reports it as TCP_MISS but my scripts convert it to a TCP_MISS__UPLOAD. cruncher1 3128 - - [11/Dec/2020:00:54:50 +0100] "POST http://lhcathome-upload.cern.ch/lhcathome_cgi/file_upload_handler HTTP/1.1" 200 167649128 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT CMS uploads use PUT instead of POST. They require a special iptables ruleset to appear in the access.log as they are done with gfal-copy from inside the VM. They are also reported as TCP_MISS and converted to TCP_MISS__UPLOAD: cruncher1 3128 - - [11/Dec/2020:01:04:58 +0100] "PUT http://vc-cms-output.s3.cern.ch/store/unmerged/DMWM_Test/RelValSinglePiE50HCAL/GEN-SIM/TC_SLC7_CMS_Home_<cut the rest of the URL> HTTP/1.1" 200 120140144 "-" "gfal2-util/1.5.3 gfal2/2.18.1 neon/0.0.29" TCP_MISS:HIER_DIRECT |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,360,108 RAC: 19,531 ![]() ![]() ![]() |
What means this Info TCP_SWAPFAIL_MISS:HIER_DIRECT?: xxxx.yyyy.zzz 3128 - - [11/Dec/2020:11:11:57 +0100] "GET http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch/data/db/a9998bc3e021441915e84f48a973cdac2a1002 HTTP/1.1" 200 17826 "-" "cvmfs Fuse 2.7.5" TCP_SWAPFAIL_MISS:HIER_DIRECT |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
The Tags are in the FAQ: https://wiki.squid-cache.org/SquidFaq/SquidLogs SWAPFAIL The object was believed to be in the cache, but could not be accessed. A new copy was requested from the server. Some old chats recommended to clean your cache. |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
I suspected that your UPLOAD was a special category. So far I just included everything that was miss in a miss. I assume I have non of the REFRESH_UNMODIFIED as it takes some time for squid to check that a cache file is truly worthy of a check on age. |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
SWAPFAIL vs. MISS There's nothing to worry about and also no practical difference between SWAPFAIL and MISS, hence, it can simply be ignored. After a while it will get sorted out automatically. SWAPFAILs usually happen when Squid restarts without a cleared cache. If you feel bothered by that just follow the suggestion from the Squid FAQ. When do files expire? This can be set by the server/intermediate proxy individually for each URL. The server sends it to the client (your proxy) as an HTTP header: Cache-Control: max-age=61 This example (in seconds) is typical for CVMFS catalogue files like: http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch/.cvmfspublished Other files on CVMFS can have "max-age=259200" (3 days) When your squid (or another client) requests a file for the first time the due date is calculated based on max-age and your clients get a HIT whenever they ask squid for that file. When the due date is over your squid asks the server if the file has changed. If not the server sends "UNMODIFIED" together with the same or a new max-age (+ other headers) but without the file itself. If the file has changed the server sends "MODIFIED" together with the updated file. Hence, you will see (UN)MODIFIED less often than HITs. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,360,108 RAC: 19,531 ![]() ![]() ![]() |
After a stop of squid and squid -z reconfigure all is ok. |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
OK, put the POST and PUT in the TCP_MISS__UPLOAD Downloads served by the proxy TCP_MEM_HIT 447727 requests 2.2 GB TCP_HIT 64986 requests 71 GB TCP_REFRESH_UNMODIFIED 0 requests 0 B Downloads requested from lhc@home TCP_MISS 12914 requests 12.5 GB TCP_REFRESH_MODIFIED 11090 requests 475.7 MB Result uploads to lhc@home TCP_MISS__UPLOAD 452 requests 2.4 GB Do you count the EVNT files? They are in the logs as a miss but you said they are not cached? Looks like CERN will be happy as I saved 71GB in less than 24hrs |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
TCP_HIT 64986 requests 71 GB Really 71 GB within 24 h? Much more than I expected! How many active cores were attached to the proxy and what project mix did you run? |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
I have 127 cores connected at the moment, seems an even split of ATLAS and Theory. There is couple of CMS. All seems to be working well though. |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
Looks to be working perfectly, I see proxy entries for valid CMS, ATLAS & Theory tasks. Looking at the stats also working well: Hits as % of all requests: 5min: 99.4%, 60min: 99.0% Hits as % of bytes sent: 5min: 99.8%, 60min: 88.4% Memory hits as % of hit requests: 5min: 92.2%, 60min: 86.9% Not sure about stats from logfiles are legit cf computezrmle's old numbers. Since the average object size is ~650KB would it be worth to increase the max mem cache item size to push more into ram? |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
I guess your stats come from this page: "squidclient mgr:info" Even the 60min values here should be seen as "short term". They can sometimmes be very low, hence don't be afraid if that happens. The "average object size" from the same page is calculated only for files that go to the disk cache. It's also a (very) long term value and Squid's (13 KB) is much too low for LHC@home. I suggest not to change this short term. See: http://www.squid-cache.org/Versions/v3/3.5/cfgman/store_avg_object_size.html Even if both caches (RAM/disk) are close to 90 %. The suggested values used in the squid.conf are well balanced and optimized for LHC@home. I would not change the settings without looking at a couple of long term stats before. Beside that even the files going to the disk cache are still in the OS's cache (RAM), especially the ones that are heavily used. |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
Thanks for the comments, I leave it as is then. Looking at the longer term from the logs. Looking over about 2 days. Downloads served by the proxy TCP_MEM_HIT 2765223 requests 12.9 GB TCP_HIT 360039 requests 392.5 GB TCP_REFRESH_UNMODIFIED 50291 requests 237.7 MB Downloads requested from lhc@home TCP_MISS 43739 requests 12.4 GB TCP_REFRESH_MODIFIED 35058 requests 875.9 MB Result uploads to lhc@home TCP_MISS__UPLOAD 2167 requests 19 GB I think the initial 12GB of misses is filling the cache, its not really going up much now. I'm not sure if I calculate the same as you. Only read the lines if they contain, cern.ch, atlas or openhtc.io Count MEM_HIT if line contains mem_hit Count HIT if line contains hit Count if TCP_REFRESH_MODIFIED if line contains TCP_REFRESH_MODIFIED Count if TCP_REFRESH_UNMODIFIED if line contains TCP_REFRESH_UNMODIFIED For Misses If EVNT file don't count if contains GET & TCP_MISS count as MISS If contains POST or PUT & TCP_MISS count as TCP_MISS__UPLOAD Ignore any other tag eg TUNNEL, ABORTED. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,360,108 RAC: 19,531 ![]() ![]() ![]() |
squid TCP_Tunnel - This task ended with error during upload: https://lhcathome.cern.ch/lhcathome/result.php?resultid=291005769 xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT and nightlies without openhtc.io on an other Computer: xx.yyy.xxx.zzz 3128 - - [13/Dec/2020:18:30:01 +0100] "GET http://cvmfs-stratum-one.cern.ch/cvmfs/atlas-nightlies.cern.ch/.cvmfspublished HTTP/1.1" 200 1269 "-" "cvmfs Fuse 2.7.5" TCP_REFRESH_MODIFIED:HIER_DIRECT |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,683,719 RAC: 119,221 ![]() ![]() |
squid TCP_Tunnel - This task ended with error during upload: There are lots of unusual messages in the pilot log (copied to stderr.txt). At the end the task didn't produce a HITS file which is the main reason it failed: [2020-12-13 17:44:13] No HITS result produced Since a non existing file can't be transferred the BOINC client also reported an upload failure: <message> upload failure: <file_xfer_error> <file_name>5uWKDmTvM8xn9Rq4apoT9bVoABFKDmABFKDme0WaDmABFKDmO0VGzn_1_r1369295079_ATLAS_result</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> and nightlies without openhtc.io on an other Computer: A minor inconsistency regarding the global CVMFS configuration. Caused by the fact that atlas-nightlies is not available on all stratum-1-servers. Volunteers running ATLAS vbox can't fix it since it is configured inside the VM. Volunteers running ATLAS native can fix it following this advice: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5342&postid=41861 Both issues have nothing to do with squid and should have been reported in the ATLAS forum. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,360,108 RAC: 19,531 ![]() ![]() ![]() |
[quote]squid TCP_Tunnel xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT WCG show the same info: xx.yyy.zzz.xxx 3128 - - [15/Dec/2020:11:56:27 +0100] "CONNECT www.worldcommunitygrid.org:443 HTTP/1.1" 200 5980 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 698,699,176 RAC: 160,599 ![]() ![]() ![]() |
I think a tunnel is where is was configured not to use the cache, for WCG this is in the config. I'm not sure why BOINC would also do. |
©2025 CERN