Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43837 - Posted: 10 Dec 2020, 21:58:58 UTC

Typical Squid statistics after 34 days runtime.
Taken from "squidclient mgr:info"
# Squid is configured to use up to 256 MB RAM cache and up to 32 GB disk cache.
# Squid is set to limit disk/RAM usage at 90% based on the configured upper limits.

******************************************
Storage Swap size: 29491068 KB
Storage Swap capacity: 90.0% used, 10.0% free

23812 on-disk objects
******************************************


******************************************
# RAM is still not completely used
Storage Mem size: 213976 KB
Storage Mem capacity: 81.6% used, 18.4% free

36712 StoreEntries with MemObjects
******************************************



Those objects (=files) are used over and over again.

=> from "squidclient mgr:refresh"

Protocol #Calls %Calls
HTTP 18716280 95.78
On Store 824463 4.22

"HTTP" means: access to RAM cache
"On Store" means: access to disk


HTTP histogram (partly):
Count %Total Category
16669106 89.06 Fresh: expires time not reached
2043554 10.92 Stale: expires time reached


On Store histogram (partly):
Count %Total Category
575087 69.75 Fresh: expires time not reached
246319 29.88 Stale: expires time reached
3057 0.37 Stale: refresh_pattern max age rule




Average CPU Usage: 3.61%
ID: 43837 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43838 - Posted: 11 Dec 2020, 0:02:07 UTC - in response to Message 43836.  

Looks OK on one I checked:
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Detected local proxy http://192.168.1.134:3128 in init_data.xml
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Testing connection to 192.168.1.134 on port 3128
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Ncat: Version 7.50 ( https://nmap.org/ncat )
2020-12-10 23:55:27 (4424): Guest Log: Ncat: Connected to 192.168.1.134:3128.
....

2020-12-10 23:55:35 (4424): Guest Log: 2.5.2.0 4109 0 29640 19355 3 1 264845 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://192.168.1.134:3128 1
2020-12-10 23:55:36 (4424): Guest Log: Probing /cvmfs/grid.cern.ch... OK

ID: 43838 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43839 - Posted: 11 Dec 2020, 0:51:02 UTC

I rebuilt your script for parsing the logs in powershell, I don't seen any upload data?

Downloads served by the proxy
TCP_MEM_HIT 80268 requests 402.8 MB
TCP_HIT 12131 requests 12.6 GB
TCP_REFRESH_UNMODIFIED 0 requests 0 B

Downloads requested from lhc@home
TCP_MISS 8154 requests 1.1 GB
TCP_REFRESH_MODIFIED 1452 requests 176.2 MB

Result uploads to lhc@home
TCP_MISS__UPLOAD 0 requests 0 B


From Squid after a few hours:

Storage Swap size: 871816 KB
Storage Swap capacity: 4.3% used, 95.7% free
1686 on-disk objects

Storage Mem size: 40988 KB
Storage Mem capacity: 15.6% used, 84.4% free
7059 StoreEntries with MemObjects

HTTP 93927 90.98
On Store 9308 9.02

92460 98.44 Fresh: expires time not reached
1467 1.56 Stale: expires time reached

8175 87.83 Fresh: expires time not reached
1131 12.15 Stale: expires time reached
2 0.02 Stale: refresh_pattern max age rule

CPU Usage, 60 minute avg: 0.30%
ID: 43839 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43841 - Posted: 11 Dec 2020, 7:25:41 UTC - in response to Message 43838.  

Looks OK on one I checked:
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Detected local proxy http://192.168.1.134:3128 in init_data.xml
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Testing connection to 192.168.1.134 on port 3128
2020-12-10 23:55:27 (4424): Guest Log: [INFO] Ncat: Version 7.50 ( https://nmap.org/ncat )
2020-12-10 23:55:27 (4424): Guest Log: Ncat: Connected to 192.168.1.134:3128.
....

2020-12-10 23:55:35 (4424): Guest Log: 2.5.2.0 4109 0 29640 19355 3 1 264845 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://192.168.1.134:3128 1
2020-12-10 23:55:36 (4424): Guest Log: Probing /cvmfs/grid.cern.ch... OK


Perfect!
:-)

To avoid confusion:
Logs from different subprojects have slightly different entries because the scripts are made by other programmers.
The output pattern of "cvmfs_config stat ..." always looks like this:
VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2.5.2.0 4109 0 29640 19355 3 1 264845 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://192.168.1.134:3128 1
ID: 43841 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43846 - Posted: 11 Dec 2020, 9:28:37 UTC - in response to Message 43839.  

Result uploads to lhc@home
TCP_MISS__UPLOAD 0 requests 0 B



Reading/Analysing squid's access.log

You will never find TCP_MISS__UPLOAD in squid's access.log since this is a term I use in my own filter scripts to combine a couple of different fields.


General rule: Uploads can't be cached, hence are never a TCP_MEM_HIT/TCP_HIT/TCP_REFRESH... .

Like most other BOINC projects Theory and Sixtrack download/upload from/to the project server via the BOINC client.
Nowadays mostly with HTTPS, which looks like this:
cruncher1 3128 - - [11/Dec/2020:00:12:09 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 25666 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_TUNNEL:HIER_DIRECT

From that line you can't see whether it was a download/upload or whether it was Theory/SixTrack.
Theory also does some transfers to/from lhcathome-upload.cern.ch but unlike ATLAS' transfers they are rather small.



ATLAS gets large EVNT files from lhcathome-upload.cern.ch.
Those files are unique, hence will always be a TCP_MISS.
In addition squid is configured not to cache them as lots of them would quickly fill the disk cache.
cruncher1 3128 - - [11/Dec/2020:00:05:43 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//322/EFILDmMLP7xnsSi4apGgGQJmABFKDmABFKDmMPBMDmABFKDmvwZSWo_EVNT.22950603._000764.pool.root.1 HTTP/1.1" 200 400963022 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT




"POST" shows this is an upload. In this case an ATLAS HITS file >160 MB.
Squid reports it as TCP_MISS but my scripts convert it to a TCP_MISS__UPLOAD.
cruncher1 3128 - - [11/Dec/2020:00:54:50 +0100] "POST http://lhcathome-upload.cern.ch/lhcathome_cgi/file_upload_handler HTTP/1.1" 200 167649128 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT




CMS uploads use PUT instead of POST.
They require a special iptables ruleset to appear in the access.log as they are done with gfal-copy from inside the VM.
They are also reported as TCP_MISS and converted to TCP_MISS__UPLOAD:
cruncher1 3128 - - [11/Dec/2020:01:04:58 +0100] "PUT http://vc-cms-output.s3.cern.ch/store/unmerged/DMWM_Test/RelValSinglePiE50HCAL/GEN-SIM/TC_SLC7_CMS_Home_<cut the rest of the URL> HTTP/1.1" 200 120140144 "-" "gfal2-util/1.5.3 gfal2/2.18.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
ID: 43846 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2100
Credit: 159,816,975
RAC: 134,993
Message 43848 - Posted: 11 Dec 2020, 10:16:32 UTC - in response to Message 43846.  

What means this Info TCP_SWAPFAIL_MISS:HIER_DIRECT?:
xxxx.yyyy.zzz 3128 - - [11/Dec/2020:11:11:57 +0100] "GET http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch/data/db/a9998bc3e021441915e84f48a973cdac2a1002 HTTP/1.1" 200 17826 "-" "cvmfs Fuse 2.7.5" TCP_SWAPFAIL_MISS:HIER_DIRECT
ID: 43848 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43849 - Posted: 11 Dec 2020, 10:51:30 UTC - in response to Message 43848.  

The Tags are in the FAQ:

https://wiki.squid-cache.org/SquidFaq/SquidLogs

SWAPFAIL

The object was believed to be in the cache, but could not be accessed. A new copy was requested from the server.

Some old chats recommended to clean your cache.
ID: 43849 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43850 - Posted: 11 Dec 2020, 11:04:42 UTC - in response to Message 43846.  

I suspected that your UPLOAD was a special category. So far I just included everything that was miss in a miss.

I assume I have non of the REFRESH_UNMODIFIED as it takes some time for squid to check that a cache file is truly worthy of a check on age.
ID: 43850 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43852 - Posted: 11 Dec 2020, 13:09:43 UTC

SWAPFAIL vs. MISS

There's nothing to worry about and also no practical difference between SWAPFAIL and MISS, hence, it can simply be ignored.
After a while it will get sorted out automatically.
SWAPFAILs usually happen when Squid restarts without a cleared cache.
If you feel bothered by that just follow the suggestion from the Squid FAQ.



When do files expire?

This can be set by the server/intermediate proxy individually for each URL.
The server sends it to the client (your proxy) as an HTTP header:
Cache-Control: max-age=61

This example (in seconds) is typical for CVMFS catalogue files like:
http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch/.cvmfspublished


Other files on CVMFS can have "max-age=259200" (3 days)

When your squid (or another client) requests a file for the first time the due date is calculated based on max-age and your clients get a HIT whenever they ask squid for that file.
When the due date is over your squid asks the server if the file has changed.
If not the server sends "UNMODIFIED" together with the same or a new max-age (+ other headers) but without the file itself.
If the file has changed the server sends "MODIFIED" together with the updated file.

Hence, you will see (UN)MODIFIED less often than HITs.
ID: 43852 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2100
Credit: 159,816,975
RAC: 134,993
Message 43853 - Posted: 11 Dec 2020, 13:42:11 UTC - in response to Message 43852.  

After a stop of squid and
squid -z reconfigure
all is ok.
ID: 43853 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43854 - Posted: 11 Dec 2020, 15:30:13 UTC - in response to Message 43846.  
Last modified: 11 Dec 2020, 15:32:43 UTC

OK, put the POST and PUT in the TCP_MISS__UPLOAD

Downloads served by the proxy
TCP_MEM_HIT 447727 requests 2.2 GB
TCP_HIT 64986 requests 71 GB
TCP_REFRESH_UNMODIFIED 0 requests 0 B

Downloads requested from lhc@home
TCP_MISS 12914 requests 12.5 GB
TCP_REFRESH_MODIFIED 11090 requests 475.7 MB

Result uploads to lhc@home
TCP_MISS__UPLOAD 452 requests 2.4 GB


Do you count the EVNT files? They are in the logs as a miss but you said they are not cached?

Looks like CERN will be happy as I saved 71GB in less than 24hrs
ID: 43854 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43859 - Posted: 11 Dec 2020, 18:31:38 UTC - in response to Message 43854.  

TCP_HIT 64986 requests 71 GB

Really 71 GB within 24 h?
Much more than I expected!

How many active cores were attached to the proxy and what project mix did you run?
ID: 43859 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43861 - Posted: 11 Dec 2020, 19:55:22 UTC - in response to Message 43859.  

I have 127 cores connected at the moment, seems an even split of ATLAS and Theory. There is couple of CMS.

All seems to be working well though.
ID: 43861 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43880 - Posted: 12 Dec 2020, 17:12:23 UTC

Looks to be working perfectly, I see proxy entries for valid CMS, ATLAS & Theory tasks.

Looking at the stats also working well:

Hits as % of all requests: 5min: 99.4%, 60min: 99.0%
Hits as % of bytes sent: 5min: 99.8%, 60min: 88.4%
Memory hits as % of hit requests: 5min: 92.2%, 60min: 86.9%

Not sure about stats from logfiles are legit cf computezrmle's old numbers.

Since the average object size is ~650KB would it be worth to increase the max mem cache item size to push more into ram?
ID: 43880 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43881 - Posted: 12 Dec 2020, 17:43:28 UTC - in response to Message 43880.  

I guess your stats come from this page: "squidclient mgr:info"

Even the 60min values here should be seen as "short term".
They can sometimmes be very low, hence don't be afraid if that happens.


The "average object size" from the same page is calculated only for files that go to the disk cache.
It's also a (very) long term value and Squid's (13 KB) is much too low for LHC@home.
I suggest not to change this short term.
See:
http://www.squid-cache.org/Versions/v3/3.5/cfgman/store_avg_object_size.html


Even if both caches (RAM/disk) are close to 90 %.
The suggested values used in the squid.conf are well balanced and optimized for LHC@home.
I would not change the settings without looking at a couple of long term stats before.

Beside that even the files going to the disk cache are still in the OS's cache (RAM), especially the ones that are heavily used.
ID: 43881 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43882 - Posted: 12 Dec 2020, 19:16:27 UTC - in response to Message 43881.  
Last modified: 12 Dec 2020, 19:19:20 UTC

Thanks for the comments, I leave it as is then. Looking at the longer term from the logs.

Looking over about 2 days.

Downloads served by the proxy
TCP_MEM_HIT 2765223 requests 12.9 GB
TCP_HIT 360039 requests 392.5 GB
TCP_REFRESH_UNMODIFIED 50291 requests 237.7 MB

Downloads requested from lhc@home
TCP_MISS 43739 requests 12.4 GB
TCP_REFRESH_MODIFIED 35058 requests 875.9 MB

Result uploads to lhc@home
TCP_MISS__UPLOAD 2167 requests 19 GB


I think the initial 12GB of misses is filling the cache, its not really going up much now.

I'm not sure if I calculate the same as you.
Only read the lines if they contain, cern.ch, atlas or openhtc.io

Count MEM_HIT if line contains mem_hit
Count HIT if line contains hit
Count if TCP_REFRESH_MODIFIED if line contains TCP_REFRESH_MODIFIED
Count if TCP_REFRESH_UNMODIFIED if line contains TCP_REFRESH_UNMODIFIED
For Misses
If EVNT file don't count
if contains GET & TCP_MISS count as MISS
If contains POST or PUT & TCP_MISS count as TCP_MISS__UPLOAD
Ignore any other tag eg TUNNEL, ABORTED.
ID: 43882 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2100
Credit: 159,816,975
RAC: 134,993
Message 43892 - Posted: 14 Dec 2020, 7:01:12 UTC

squid TCP_Tunnel - This task ended with error during upload:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=291005769
xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT

and nightlies without openhtc.io on an other Computer:
xx.yyy.xxx.zzz 3128 - - [13/Dec/2020:18:30:01 +0100] "GET http://cvmfs-stratum-one.cern.ch/cvmfs/atlas-nightlies.cern.ch/.cvmfspublished HTTP/1.1" 200 1269 "-" "cvmfs Fuse 2.7.5" TCP_REFRESH_MODIFIED:HIER_DIRECT
ID: 43892 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,426,487
RAC: 132,640
Message 43902 - Posted: 14 Dec 2020, 12:47:46 UTC - in response to Message 43892.  

squid TCP_Tunnel - This task ended with error during upload:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=291005769
xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT

There are lots of unusual messages in the pilot log (copied to stderr.txt).
At the end the task didn't produce a HITS file which is the main reason it failed:
[2020-12-13 17:44:13] No HITS result produced



Since a non existing file can't be transferred the BOINC client also reported an upload failure:
<message>
upload failure: <file_xfer_error>
  <file_name>5uWKDmTvM8xn9Rq4apoT9bVoABFKDmABFKDme0WaDmABFKDmO0VGzn_1_r1369295079_ATLAS_result</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>



and nightlies without openhtc.io on an other Computer:
xx.yyy.xxx.zzz 3128 - - [13/Dec/2020:18:30:01 +0100] "GET http://cvmfs-stratum-one.cern.ch/cvmfs/atlas-nightlies.cern.ch/.cvmfspublished HTTP/1.1" 200 1269 "-" "cvmfs Fuse 2.7.5" TCP_REFRESH_MODIFIED:HIER_DIRECT

A minor inconsistency regarding the global CVMFS configuration.
Caused by the fact that atlas-nightlies is not available on all stratum-1-servers.

Volunteers running ATLAS vbox can't fix it since it is configured inside the VM.
Volunteers running ATLAS native can fix it following this advice:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5342&postid=41861


Both issues have nothing to do with squid and should have been reported in the ATLAS forum.
ID: 43902 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2100
Credit: 159,816,975
RAC: 134,993
Message 43920 - Posted: 15 Dec 2020, 11:20:30 UTC - in response to Message 43892.  

[quote]squid TCP_Tunnel
xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT

WCG show the same info:
xx.yyy.zzz.xxx 3128 - - [15/Dec/2020:11:56:27 +0100] "CONNECT www.worldcommunitygrid.org:443 HTTP/1.1" 200 5980 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT
ID: 43920 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,395,490
RAC: 281,596
Message 43921 - Posted: 15 Dec 2020, 11:37:24 UTC - in response to Message 43920.  

I think a tunnel is where is was configured not to use the cache, for WCG this is in the config. I'm not sure why BOINC would also do.
ID: 43921 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions


©2024 CERN