Message boards : ATLAS application : Native Problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Bryan

Send message
Joined: 30 Jan 14
Posts: 4
Credit: 25,027,828
RAC: 0
Message 39430 - Posted: 24 Jul 2019, 22:17:07 UTC
Last modified: 24 Jul 2019, 22:20:46 UTC

I have Atlas native running on multiple machines under Mint 19. WU crunch and validate so that isn't an issue. However I have 2 problems and 1 is serious.

Following the Mint 19 setup shown HERE when I get to the following line I get an error that that file/directory doesn't exist:
sudo echo "/cvmfs /etc/auto.cvmfs" > /etc/auto.master.d/cvmfs.autofs

The line following line gives the error; no such command. The restart command and probe work correctly.

All machine have failed the above. I don't know if it is actually something I need to worry about or not.

SERIOUS problem
I have 1 machine, a dual E5 V4 Xeon Server, that fills up the 1TB drive over a period of 24-36 hours. If I don't reboot the machine and let it do garbage collection then BOINC will stop pulling work because it doesn't have enough disk space. This machine is the same as my other machines, has the exact same Linux Mint 19 installation, and has been setup exactly the same. Only 1 of my 7 machines is having this problem. I have installed Mint 19 3 different times on the machine and the problem shows up every time.

Watching the Squid I see that machine frequently hitting the proxy (and getting misses) even when there are no new WU starting up or ending. New WU will get MEM HITS so whatever it is doing is out for the ordinary. The biggest problem is the amount of data usage since I have a 1TB data cap and above that I pay a premium for any additional. I've now filled the 1TB HDD 3 times so that is a lot of BW being wasted.

It appears that the continually requesting of stuff doesn't begin until the machine has been running for hours and then I start seeing the abnormal behavior on the squid.

I'm at a loss on this one so any suggestions would be very welcome :)

df for the HDD

Filesystem Size Used Avail Use% Mounted on
udev 68G 0 68G 0% /dev
tmpfs 14G 2.8M 14G 1% /run
/dev/sda1 984G 803G 131G 86% /
tmpfs 68G 2.3M 68G 1% /dev/shm
tmpfs 5.3M 8.2k 5.3M 1% /run/lock
tmpfs 68G 0 68G 0% /sys/fs/cgroup
tmpfs 14G 21k 14G 1% /run/user/1000
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/sft.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas-condb.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/grid.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/cernvm-prod.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/alice.cern.ch
cvmfs2 4.3G 2.9G 1.5G 66% /cvmfs/atlas-nightlies.cern.ch
ID: 39430 · Report as offensive     Reply Quote
Profile Bryan

Send message
Joined: 30 Jan 14
Posts: 4
Credit: 25,027,828
RAC: 0
Message 39431 - Posted: 25 Jul 2019, 1:53:24 UTC - in response to Message 39430.  

I just found that even after BOINC has finished all WU that machine continues to go through the proxy requesting something and continues to use HDD space. I think the machine is possessed :)
ID: 39431 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,541,788
RAC: 120,716
Message 39432 - Posted: 25 Jul 2019, 7:56:47 UTC - in response to Message 39430.  

A couple of your questions can't be answered as your hosts are hidden.
You may unhide them at your web preferences page:
https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project



Some comments regarding the information you provided.


autofs configuration

As long as cvmfs_config probe is working correctly there's no need to reconfigure the autofs service.
Especially autofs must not be restarted while an LHC task is running.
Just to explain the error in the Mint 19 setup you linked to:
# wrong systemctl syntax
# this will always fail
sudo systemctl autofs restart

# correct syntax
# use 1 of the examples below
sudo systemctl restart autofs
sudo systemctl restart autofs.service




Data that fills your HDD

What kind of data?
Did you identify the subdir the data is written to?



CVMFS/Squid

Your df lines show that CVMFS is configured to use roughly 4 GB cache.
This is a good value for ATLAS native and Theory native and should be kept.

In addition ATLAS frequently requests data from another distribution system, the Frontier Caches.
A local squid serves both, a cache MISS from the CVMFS as well as requests to Frontier.

Watching the Squid I see that machine frequently hitting the proxy...

Might be normal.

... and getting misses ...
... even when there are no new WU starting up or ending ...

Might also be normal.
You may post some examples from your squid's access log.

New WU will get MEM HITS ...

The more the better. A MEM_HIT is the best you can get from your squid.


It would be necessary to run a logfile analyzer to see the ratio between cache hits and misses.
On the other hand there are requests that:
- can never be cached, e.g. all result uploads
- should be excluded from caching, e.g. EVNT downloads


The default squid.conf is not prepared to handle the special needs of LHC@home.
Hence my suggestion here should be used together with squid v3.5.27:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4611&postid=36101
ID: 39432 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,541,788
RAC: 120,716
Message 39433 - Posted: 25 Jul 2019, 7:59:04 UTC - in response to Message 39431.  

I just found that even after BOINC has finished all WU that machine continues to go through the proxy requesting something and continues to use HDD space. I think the machine is possessed :)

What is this "something"?
You may post some examples from your squid's access log.
ID: 39433 · Report as offensive     Reply Quote

Message boards : ATLAS application : Native Problem


©2024 CERN