Message boards : ATLAS application : ATLAS tasks fail after 10 min
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 42871 - Posted: 14 Jun 2020, 10:40:27 UTC
Last modified: 14 Jun 2020, 10:50:51 UTC

computezrmle thanks as always. Using my monkey-see monkey-do powers I made these changes for my US location but I'm still getting nothing but "Validate errors."
sudo xed /etc/cvmfs/default.local
CVMFS_REPOSITORIES="atlas,atlas-condb,grid,cernvm-prod,sft,alice"
CVMFS_SEND_INFO_HEADER=yes
CVMFS_QUOTA_LIMIT=4096
CVMFS_CACHE_BASE=/scratch/cvmfs
CVMFS_HTTP_PROXY=DIRECT

sudo xed /etc/cvmfs/domain.d/cern.ch.local
CVMFS_SERVER_URL="http://s1unl-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1fnal-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1bnl-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1cern-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1ral-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/@fqrn@;http://s1ihep-cvmfs.openhtc.io/cvmfs/@fqrn@"
CVMFS_USE_GEOAPI=no

sudo xed /etc/cvmfs/config.d/atlas-nightlies.cern.ch.local
CVMFS_SERVER_URL="http://s1cern-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1bnl-cvmfs.openhtc.io/cvmfs/@fqrn@"
CVMFS_USE_GEOAPI=no

sudo cvmfs_config reload
Is there something else wrong?
This is on a new build computer where I configured ATLAS as before from my notes:
wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb ; \
 sudo dpkg -i cvmfs-release-latest_all.deb ; \
 rm -f cvmfs-release-latest_all.deb ; \
 sudo apt-get update ; \
 sudo apt-get install cvmfs ; \
 sudo apt install glibc-doc open-iscsi watchdog

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/default.local -O /etc/cvmfs/default.local ; \
 sudo cvmfs_config setup ; \
 sudo echo "/cvmfs /etc/auto.cvmfs" > /etc/auto.master.d/cvmfs.autofs ; \
 sudo systemctl restart autofs ; \
 cvmfs_config probe

sudo cvmfs_config reload
ID: 42871 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 42872 - Posted: 14 Jun 2020, 10:58:46 UTC

aurum@Rig-32:~$ cvmfs_config showconfig -s atlas.cern.ch
CVMFS_REPOSITORY_NAME=atlas.cern.ch
CVMFS_BACKOFF_INIT=2    # from /etc/cvmfs/default.conf
CVMFS_BACKOFF_MAX=10    # from /etc/cvmfs/default.conf
CVMFS_BASE_ENV=1    # from /etc/cvmfs/default.conf
CVMFS_CACHE_BASE=/scratch/cvmfs    # from /etc/cvmfs/default.local
CVMFS_CACHE_DIR=/scratch/cvmfs/shared
CVMFS_CHECK_PERMISSIONS=yes    # from /etc/cvmfs/default.conf
CVMFS_CLAIM_OWNERSHIP=yes    # from /etc/cvmfs/default.conf
CVMFS_CONFIG_REPOSITORY=cvmfs-config.cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_DEFAULT_DOMAIN=cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_HOST_RESET_AFTER=1800    # from /etc/cvmfs/default.conf
CVMFS_HTTP_PROXY=DIRECT    # from /etc/cvmfs/default.local
CVMFS_IGNORE_SIGNATURE=no    # from /etc/cvmfs/default.conf
CVMFS_KEYS_DIR=/etc/cvmfs/keys/cern.ch    # from /etc/cvmfs/domain.d/cern.ch.conf
CVMFS_LOW_SPEED_LIMIT=1024    # from /etc/cvmfs/default.conf
CVMFS_MAX_RETRIES=1    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_DIR=/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_RW=no    # from /etc/cvmfs/default.conf
CVMFS_NFILES=65536    # from /etc/cvmfs/default.conf
CVMFS_NFS_SOURCE=no    # from /etc/cvmfs/default.conf
CVMFS_PAC_URLS=http://wpad/wpad.dat    # from /etc/cvmfs/default.conf
CVMFS_PROXY_RESET_AFTER=300    # from /etc/cvmfs/default.conf
CVMFS_QUOTA_LIMIT=4096    # from /etc/cvmfs/default.local
CVMFS_RELOAD_SOCKETS=/var/run/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_REPOSITORIES=atlas,atlas-condb,grid,cernvm-prod,sft,alice    # from /etc/cvmfs/default.local
CVMFS_SEND_INFO_HEADER=yes    # from /etc/cvmfs/default.local
CVMFS_SERVER_URL='http://s1unl-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1fnal-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1bnl-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1ral-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch;http://s1ihep-cvmfs.openhtc.io/cvmfs/atlas.cern.ch'    # from /etc/cvmfs/domain.d/cern.ch.local
CVMFS_SHARED_CACHE=yes    # from /etc/cvmfs/default.conf
CVMFS_STRICT_MOUNT=no    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT=5    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT_DIRECT=10    # from /etc/cvmfs/default.conf
CVMFS_USE_GEOAPI=no    # from /etc/cvmfs/domain.d/cern.ch.local
CVMFS_USER=cvmfs    # from /etc/cvmfs/default.conf
ID: 42872 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42873 - Posted: 14 Jun 2020, 11:58:40 UTC

so, I wanted to give it another try.
However, still negative. The task again errored out after about 10 minutes, with CPU time only 1 min 48 secs.

When looking at the stderr, my eye caught one strange thing 43 seconds after start:

2020-06-14 13:31:54 (2632): Guest Log: 00:00:00.003737 main Error: Service 'control' failed to initialize: VERR_INVALID_PARAMETER

So I guess that from this time on the task was lost.

for the complete information, see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277703088

Does no-one from the experts here have any idea what's going wrong?

As said before, also on two other systems the same things happens since 2 days. Whereas before, ATLAS crunching under same settings was no problem at all.
For me, that's even more annoying as I recently bought some additional RAM to upgrade another of my machines for ATLAS crunching :-(
ID: 42873 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 42874 - Posted: 14 Jun 2020, 13:03:25 UTC - in response to Message 42873.  

Does no-one from the experts here have any idea what's going wrong?

I'm not really sure that they are aware of the current situation.
Normally when things go wrong David Cameron informs us what's going on pretty fast.
The problems occured Thursday and since then there is not a single word from the experts or official moderators.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 42874 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42876 - Posted: 14 Jun 2020, 13:41:08 UTC - in response to Message 42874.  

The problems occured Thursday and since then there is not a single word from the experts or official moderators.
you say it; this is somewhat strange :-(
At any rate, for the time being I am no longer trying ATLAS; I've switched to CMS.
ID: 42876 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 42877 - Posted: 14 Jun 2020, 14:12:48 UTC - in response to Message 42874.  

Normally when things go wrong David Cameron informs us what's going on pretty fast.


Sorry about this mess, I made the mistake of taking a day off right after a major update of one of the ATLAS systems and this update seemed to break BOINC tasks... I have just reverted the BOINC tasks back to use the previous version of this particular software so I hope new tasks will succeed. I'll investigate in the next days what the problem was.
ID: 42877 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 42878 - Posted: 14 Jun 2020, 15:47:14 UTC
Last modified: 14 Jun 2020, 15:47:39 UTC

I turned ATLAS back on hoping to find beautiful bosons but so far all I get are Validation Errors. Do we need to crank through the degenerates first???
ID: 42878 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42879 - Posted: 14 Jun 2020, 16:11:03 UTC

David, thanks for the explanation.

So I downloaded a new ATLAS task, but again it errored out after 14 minutes - see here:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=277712354

Still something doesn't seem to work the way it's supposed to.
ID: 42879 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 42880 - Posted: 14 Jun 2020, 16:15:18 UTC - in response to Message 42879.  
Last modified: 14 Jun 2020, 16:15:41 UTC

Native ATLAS is not working for me either, but I am assuming that they are still on restricted staffing at CERN and won't get to it until the middle of the week at the earliest.
CMS is fine.
ID: 42880 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42881 - Posted: 14 Jun 2020, 18:02:52 UTC

BTW, I just notice some interesting figures regarding ATLAS on the Server Status Page:

4.496 tasks in process - 42 users within past 24 hours - ???
ID: 42881 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 42882 - Posted: 14 Jun 2020, 18:23:36 UTC - in response to Message 42881.  
Last modified: 14 Jun 2020, 18:25:24 UTC

I find it strange that the average runtime is still around an hour.
It should be close to zero. But maybe they just don't count the invalids?
ID: 42882 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42883 - Posted: 14 Jun 2020, 18:52:13 UTC - in response to Message 42882.  

I find it strange that the average runtime is still around an hour.
It should be close to zero. But maybe they just don't count the invalids?
something seems rather wrong with the entries for ATLAS.
ID: 42883 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2500
Credit: 248,615,718
RAC: 126,629
Message 42884 - Posted: 15 Jun 2020, 6:33:25 UTC - in response to Message 42872.  

@Aurum

On Sat you posted this:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5445&postid=42864
It makes me guess you are running a local squid.

On Sun you posted your CVMFS configuration:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5368&postid=42872
This configuration bypasses your local squid:
CVMFS_HTTP_PROXY=DIRECT    # from /etc/cvmfs/default.local

You may edit /etc/cvmfs/default.local and set:
# if you have a reliable local hostname resolution
# replace "hostname_of_your_squid" with the hostname of your squid box :-)
# replace 3128 with the TCP port your squid is listening to (3128 is the default)
CVMFS_HTTP_PROXY="http://hostname_of_your_squid:3128"

# as an option use the IP of your squid box
# replace the example IP with the one you are using
CVMFS_HTTP_PROXY="http://203.0.113.77:3128"


In addition your local squid has to be set in your BOINC client as this is used to configure all LHC vbox tasks as well as ATLAS (native)'s Frontier client.
ID: 42884 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 42885 - Posted: 15 Jun 2020, 8:06:14 UTC

The new tasks submitted since yesterday are working ok, however it takes some time to flush out the bad tasks so you will still see a mixture of success and failure at the moment.
ID: 42885 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,906,164
RAC: 0
Message 42886 - Posted: 15 Jun 2020, 11:07:47 UTC - in response to Message 42884.  
Last modified: 15 Jun 2020, 11:38:58 UTC

computezrmle, Yes you talked me into installing a squid :-) You had me put this in the squid.conf:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
# see ACL definition above
# Examples:
# http_access allow crunchers
# http_access allow localnet
http_access allow localhost
http_access deny all
http_access allow crunch

# http_port
# Don't bind it to an IP accessible from outside unless you know what you're doing. E.g.,
http_port 192.168.1.227:3128
And trust me I don't know what I'm doing.

So I assume my /etc/cvmfs/default.local should be this on all my computers:
CVMFS_REPOSITORIES="atlas,atlas-condb,grid,cernvm-prod,sft,alice"
CVMFS_SEND_INFO_HEADER=yes
CVMFS_QUOTA_LIMIT=4096
CVMFS_CACHE_BASE=/scratch/cvmfs
CVMFS_HTTP_PROXY="http://192.168.1.227:3128"

I do not run vbox and have these two lines in my BOINC cc_config (same on all computers):
<dont_use_vbox>1</dont_use_vbox>
<vbox_window>0</vbox_window>

In addition your local squid has to be set in your BOINC client as this is used to configure all LHC vbox tasks as well as ATLAS (native)'s Frontier client.
I don't see any line in my BOINC cc_config file that might do this. How do I do this???

(In thinking about squids I'm reminded of what my physics professors used to say a century ago, "Don't worry the exam will be conceptual." :-)
ID: 42886 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1749
Credit: 115,501,579
RAC: 88,270
Message 42890 - Posted: 16 Jun 2020, 5:33:48 UTC - in response to Message 42885.  

David Cameron wrote:
The new tasks submitted since yesterday are working ok, however it takes some time to flush out the bad tasks so you will still see a mixture of success and failure at the moment.
thanks, David, for the information. I now got tasks which worked well.
Something seems to have happened to the credit calculation though: I git between 10 and 12 points per task :-(
ID: 42890 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2500
Credit: 248,615,718
RAC: 126,629
Message 42992 - Posted: 9 Jul 2020, 19:51:08 UTC - in response to Message 42886.  

Just stumbled over your unanswered question:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5368&postid=42886

Sorry for the delay, the last weeks were very busy.

http_access deny all
http_access allow crunch


The 2nd line will never be evaluated.
Since order matters evaluation will stop at the 1st line as it becomes true in all cases.

To give "crunch" a chance you may at least switch both lines:
http_access allow crunch
http_access deny all



A better idea would be to check your squid.conf against the revised version here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5473
ID: 42992 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : ATLAS tasks fail after 10 min


©2024 CERN