Native Problem

Author	Message
Bryan Send message Joined: 30 Jan 14 Posts: 4 Credit: 25,027,828 RAC: 0	Message 39430 - Posted: 24 Jul 2019, 22:17:07 UTC Last modified: 24 Jul 2019, 22:20:46 UTC I have Atlas native running on multiple machines under Mint 19. WU crunch and validate so that isn't an issue. However I have 2 problems and 1 is serious. Following the Mint 19 setup shown HERE when I get to the following line I get an error that that file/directory doesn't exist: sudo echo "/cvmfs /etc/auto.cvmfs" > /etc/auto.master.d/cvmfs.autofs The line following line gives the error; no such command. The restart command and probe work correctly. All machine have failed the above. I don't know if it is actually something I need to worry about or not. SERIOUS problem I have 1 machine, a dual E5 V4 Xeon Server, that fills up the 1TB drive over a period of 24-36 hours. If I don't reboot the machine and let it do garbage collection then BOINC will stop pulling work because it doesn't have enough disk space. This machine is the same as my other machines, has the exact same Linux Mint 19 installation, and has been setup exactly the same. Only 1 of my 7 machines is having this problem. I have installed Mint 19 3 different times on the machine and the problem shows up every time. Watching the Squid I see that machine frequently hitting the proxy (and getting misses) even when there are no new WU starting up or ending. New WU will get MEM HITS so whatever it is doing is out for the ordinary. The biggest problem is the amount of data usage since I have a 1TB data cap and above that I pay a premium for any additional. I've now filled the 1TB HDD 3 times so that is a lot of BW being wasted. It appears that the continually requesting of stuff doesn't begin until the machine has been running for hours and then I start seeing the abnormal behavior on the squid. I'm at a loss on this one so any suggestions would be very welcome :) df for the HDD Filesystem Size Used Avail Use% Mounted on udev 68G 0 68G 0% /dev tmpfs 14G 2.8M 14G 1% /run /dev/sda1 984G 803G 131G 86% / tmpfs 68G 2.3M 68G 1% /dev/shm tmpfs 5.3M 8.2k 5.3M 1% /run/lock tmpfs 68G 0 68G 0% /sys/fs/cgroup tmpfs 14G 21k 14G 1% /run/user/1000 cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/sft.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas-condb.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/grid.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/cernvm-prod.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/alice.cern.ch cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas-nightlies.cern.ch ID: 39430 · Reply Quote

Bryan Send message Joined: 30 Jan 14 Posts: 4 Credit: 25,027,828 RAC: 0	Message 39431 - Posted: 25 Jul 2019, 1:53:24 UTC - in response to Message 39430. I just found that even after BOINC has finished all WU that machine continues to go through the proxy requesting something and continues to use HDD space. I think the machine is possessed :) ID: 39431 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,762,812 RAC: 82,029	Message 39432 - Posted: 25 Jul 2019, 7:56:47 UTC - in response to Message 39430. A couple of your questions can't be answered as your hosts are hidden. You may unhide them at your web preferences page: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project Some comments regarding the information you provided. autofs configuration As long as cvmfs_config probe is working correctly there's no need to reconfigure the autofs service. Especially autofs must not be restarted while an LHC task is running. Just to explain the error in the Mint 19 setup you linked to: # wrong systemctl syntax # this will always fail sudo systemctl autofs restart # correct syntax # use 1 of the examples below sudo systemctl restart autofs sudo systemctl restart autofs.service Data that fills your HDD What kind of data? Did you identify the subdir the data is written to? CVMFS/Squid Your df lines show that CVMFS is configured to use roughly 4 GB cache. This is a good value for ATLAS native and Theory native and should be kept. In addition ATLAS frequently requests data from another distribution system, the Frontier Caches. A local squid serves both, a cache MISS from the CVMFS as well as requests to Frontier. Watching the Squid I see that machine frequently hitting the proxy... Might be normal. ... and getting misses ... ... even when there are no new WU starting up or ending ... Might also be normal. You may post some examples from your squid's access log. New WU will get MEM HITS ... The more the better. A MEM_HIT is the best you can get from your squid. It would be necessary to run a logfile analyzer to see the ratio between cache hits and misses. On the other hand there are requests that: - can never be cached, e.g. all result uploads - should be excluded from caching, e.g. EVNT downloads The default squid.conf is not prepared to handle the special needs of LHC@home. Hence my suggestion here should be used together with squid v3.5.27: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4611&postid=36101 ID: 39432 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,762,812 RAC: 82,029	Message 39433 - Posted: 25 Jul 2019, 7:59:04 UTC - in response to Message 39431. I just found that even after BOINC has finished all WU that machine continues to go through the proxy requesting something and continues to use HDD space. I think the machine is possessed :) What is this "something"? You may post some examples from your squid's access log. ID: 39433 · Reply Quote

LHC@home