41) Message boards : ATLAS application : ATLAS tasks fail after 10 min (Message 42878)
Posted 14 Jun 2020 by Aurum
Post:
I turned ATLAS back on hoping to find beautiful bosons but so far all I get are Validation Errors. Do we need to crank through the degenerates first???
42) Message boards : ATLAS application : ATLAS tasks fail after 10 min (Message 42872)
Posted 14 Jun 2020 by Aurum
Post:
aurum@Rig-32:~$ cvmfs_config showconfig -s atlas.cern.ch
CVMFS_REPOSITORY_NAME=atlas.cern.ch
CVMFS_BACKOFF_INIT=2    # from /etc/cvmfs/default.conf
CVMFS_BACKOFF_MAX=10    # from /etc/cvmfs/default.conf
CVMFS_BASE_ENV=1    # from /etc/cvmfs/default.conf
CVMFS_CACHE_BASE=/scratch/cvmfs    # from /etc/cvmfs/default.local
CVMFS_CACHE_DIR=/scratch/cvmfs/shared
CVMFS_CHECK_PERMISSIONS=yes    # from /etc/cvmfs/default.conf
CVMFS_CLAIM_OWNERSHIP=yes    # from /etc/cvmfs/default.conf
CVMFS_CONFIG_REPOSITORY=cvmfs-config.cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_DEFAULT_DOMAIN=cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_HOST_RESET_AFTER=1800    # from /etc/cvmfs/default.conf
CVMFS_HTTP_PROXY=DIRECT    # from /etc/cvmfs/default.local
CVMFS_IGNORE_SIGNATURE=no    # from /etc/cvmfs/default.conf
CVMFS_KEYS_DIR=/etc/cvmfs/keys/cern.ch    # from /etc/cvmfs/domain.d/cern.ch.conf
CVMFS_LOW_SPEED_LIMIT=1024    # from /etc/cvmfs/default.conf
CVMFS_MAX_RETRIES=1    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_DIR=/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_RW=no    # from /etc/cvmfs/default.conf
CVMFS_NFILES=65536    # from /etc/cvmfs/default.conf
CVMFS_NFS_SOURCE=no    # from /etc/cvmfs/default.conf
CVMFS_PAC_URLS=http://wpad/wpad.dat    # from /etc/cvmfs/default.conf
CVMFS_PROXY_RESET_AFTER=300    # from /etc/cvmfs/default.conf
CVMFS_QUOTA_LIMIT=4096    # from /etc/cvmfs/default.local
CVMFS_RELOAD_SOCKETS=/var/run/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_REPOSITORIES=atlas,atlas-condb,grid,cernvm-prod,sft,alice    # from /etc/cvmfs/default.local
CVMFS_SEND_INFO_HEADER=yes    # from /etc/cvmfs/default.local
CVMFS_SERVER_URL='http://s1unl-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1fnal-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1bnl-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1ral-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch;http://s1ihep-cvmfs.openhtc.io/cvmfs/atlas.cern.ch'    # from /etc/cvmfs/domain.d/cern.ch.local
CVMFS_SHARED_CACHE=yes    # from /etc/cvmfs/default.conf
CVMFS_STRICT_MOUNT=no    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT=5    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT_DIRECT=10    # from /etc/cvmfs/default.conf
CVMFS_USE_GEOAPI=no    # from /etc/cvmfs/domain.d/cern.ch.local
CVMFS_USER=cvmfs    # from /etc/cvmfs/default.conf
43) Message boards : ATLAS application : ATLAS tasks fail after 10 min (Message 42871)
Posted 14 Jun 2020 by Aurum
Post:
computezrmle thanks as always. Using my monkey-see monkey-do powers I made these changes for my US location but I'm still getting nothing but "Validate errors."
sudo xed /etc/cvmfs/default.local
CVMFS_REPOSITORIES="atlas,atlas-condb,grid,cernvm-prod,sft,alice"
CVMFS_SEND_INFO_HEADER=yes
CVMFS_QUOTA_LIMIT=4096
CVMFS_CACHE_BASE=/scratch/cvmfs
CVMFS_HTTP_PROXY=DIRECT

sudo xed /etc/cvmfs/domain.d/cern.ch.local
CVMFS_SERVER_URL="http://s1unl-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1fnal-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1bnl-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1cern-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1ral-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/@fqrn@;http://s1ihep-cvmfs.openhtc.io/cvmfs/@fqrn@"
CVMFS_USE_GEOAPI=no

sudo xed /etc/cvmfs/config.d/atlas-nightlies.cern.ch.local
CVMFS_SERVER_URL="http://s1cern-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1bnl-cvmfs.openhtc.io/cvmfs/@fqrn@"
CVMFS_USE_GEOAPI=no

sudo cvmfs_config reload
Is there something else wrong?
This is on a new build computer where I configured ATLAS as before from my notes:
wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb ; \
 sudo dpkg -i cvmfs-release-latest_all.deb ; \
 rm -f cvmfs-release-latest_all.deb ; \
 sudo apt-get update ; \
 sudo apt-get install cvmfs ; \
 sudo apt install glibc-doc open-iscsi watchdog

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/default.local -O /etc/cvmfs/default.local ; \
 sudo cvmfs_config setup ; \
 sudo echo "/cvmfs /etc/auto.cvmfs" > /etc/auto.master.d/cvmfs.autofs ; \
 sudo systemctl restart autofs ; \
 cvmfs_config probe

sudo cvmfs_config reload
44) Message boards : ATLAS application : ATLAS tasks fail after 10 min (Message 42865)
Posted 13 Jun 2020 by Aurum
Post:
Searching for "strftime" at the top of this page will deliver the answer.
Hmm, was that answer retracted???
Sorry, couldn't find anything matching your search query.
45) Message boards : ATLAS application : Squid proxies may need restart (Message 42864)
Posted 13 Jun 2020 by Aurum
Post:
You may also insert the following line in your squid.conf and do a "squid -k reconfigure".
shutdown_lifetime 3 seconds
This avoids the 60 seconds default delay when you shutdown/restart squid but I'm not 100% sure if changing this timeout requires a squid -k restart. At least Squid will be prepared for the next restart.
Not sure where to stick it but this spot felt oh so right:
# You don't believe this is enough? For sure, it is!
cache_mem 192 MB
maximum_object_size_in_memory 24 KB
memory_replacement_policy heap GDSF

shutdown_lifetime 3 seconds

No idea what I'm doing. Is there an LHC Squid Care & Feeding Guide anywhere?
46) Message boards : ATLAS application : Validate error on all tasks, and short run time with 1 core only (Message 42854)
Posted 12 Jun 2020 by Aurum
Post:
In addition to the validating errors on ATLAS i have now troubles getting other LHC workunits.
I had trouble too until I read LHC BOINC Messages and saw that no CPU was requested because queue was full and none needed. I suspended unstarted WUs and LHC immediately DLed a boatload. Hopefully this batch won't fail instantly.

Edit: Not looking good: Valids zero, Invalids 73. Validation error.
47) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40834)
Posted 7 Dec 2019 by Aurum
Post:
The local BOINC client will simply ignore xml tags that are not defined for app_config.xml.
Among those ignored tags are:
<maintain>18</maintain>
<priority>1</priority>
Duh. So invent them.
48) Message boards : Theory Application : New version 300.00 (Message 40831)
Posted 7 Dec 2019 by Aurum
Post:
Set 'No Limit' and you will get as many tasks as you have cores or # of jobs, if the latter is less.
None of my rigs have been supplied with more than ten Theory WUs and all specify No Limit/No Limit in Prefs..
49) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40830)
Posted 7 Dec 2019 by Aurum
Post:
On computers with lots of cores it might be worth to set up additional BOINC client instances.
This is more work than I'm willing to do.
I have a much better idea. Add BOINC commands that tell the server what to do:
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699 v4  22c44t  32 GB RAM L3 Cache = 55 MB  -->
    <maintain>18</maintain>
    <max_concurrent>16</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <cmdline>--nthreads 1</cmdline>
</app_version>
<app>
    <name>sixtrack</name>
    <maintain>9</maintain>
    <max_concurrent>6</max_concurrent>
</app>
<app>
    <name>Theory</name>
    <maintain>44</maintain>
</app>
<app>
    <name>CMS</name>
    <maintain>0</maintain>
</app>
</app_config>
And even better would be:
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699 v4  22c44t  32 GB RAM L3 Cache = 55 MB  -->
    <priority>1</priority>
    <max_concurrent>16</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <cmdline>--nthreads 1</cmdline>
</app_version>
<app>
    <name>sixtrack</name>
    <priority>3</priority>
</app>
<app>
    <name>Theory</name>
    <priority>2</priority>
</app>
<app>
    <name>CMS</name>
    <priority>0</priority>
</app>
</app_config>
50) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40826)
Posted 7 Dec 2019 by Aurum
Post:
I thought nT was in production but it's limited to a fearful 10 WUs per rig. Because of the RAM-hungry ATLAS WUs I have to run ST to fill out my threads. I think BOINC runs best with fewer projects but stuck with running three now.
51) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40824)
Posted 7 Dec 2019 by Aurum
Post:
Thanks Gunde, I checked the Preferences/Theory Simulation box and 300s started flowing down.
52) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40820)
Posted 6 Dec 2019 by Aurum
Post:
But maeax appears to be the world record holder for longest running nTheory WU :-)
Point is there's only one left as shown on Server Stats. Are you saying nT 1.1 is done?
Just wondering if we'll get more nT WUs.
53) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40815)
Posted 6 Dec 2019 by Aurum
Post:
Now that maeax has the last nT WU running will nT 1.1 WUs be released to the public???
54) Message boards : Theory Application : Sherpa - longest runtime with Success - native (Message 40811)
Posted 6 Dec 2019 by Aurum
Post:
Ten days, you must be a very patient person :-)
55) Message boards : Sixtrack Application : Wrong Factor sent by Project Server (Message 40768)
Posted 3 Dec 2019 by Aurum
Post:
For a home made fix can we add to our app_configs???
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699 v4  22c44t  L3 Cache = 55 MB  -->
    <max_concurrent>6</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>6</avg_ncpus>
    <cmdline>--nthreads 6</cmdline>
</app_version>
<app>
    <name>sixtrack</name>
    <max_concurrent>38</max_concurrent>
</app>
<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>avx</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>
<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>sse2</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>
</app_config>
And how do we handle the multiple plan_classes???
56) Message boards : Number crunching : Max # jobs and Max # CPUs (Message 40708)
Posted 27 Nov 2019 by Aurum
Post:
Max #tasks
- should act like <project_max_concurrent>
Be even better if Max#tasks behaved liked like:
<max_concurrent>
and there was a setting in Preferences for each project.
57) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40671)
Posted 25 Nov 2019 by Aurum
Post:
maeax, I don't think we're talking about the same dog's breakfast :-)

computezrmle, Thx! That's the behavior I've seen and now I know what's going on in the blackbox :-)
I think I'll go with:
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699v4  22c44t  L3 Cache = 55 MB  -->
    <max_concurrent>2</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>6</avg_ncpus>
    <cmdline>--nthreads 6</cmdline>
</app_version>
</app_config>
and leave No Limit/No Limit with ST & nT checked for a few days see how it shakes out.
58) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40666)
Posted 25 Nov 2019 by Aurum
Post:
I wish I understood the impact of changing the number of CPUs in an ATLAS WU. Does an ATLAS WU run faster with more CPUs?
Since the return file size remains the same regardless of whether I have 1, 2, 3, or 16 CPUs in that WU I might as well just use 16c WUs and run only one at a time to reduce the number of files I have to upload.
Which means I'll get far too many 16c ATLAS WUs DLed and none to not enough nT and ST WUs.

Fixing Preferences so that one could specify maximum number of WUs each project would send to a computer (like WCG does) would give crunchers flexibility. ATLAS would have the added field #CPUs. E.g., if I set the limit for ATLAS to 3 WUs and I have 4 on my computer LHC@home would not send me another ATLAS WU until I got down to two WUs and then send just one.
59) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40663)
Posted 25 Nov 2019 by Aurum
Post:
If CERN wants to maximize BOINC work then they should see if they can reduce the size of the largest return files.
Atlas is optimized, you can find some News therefore in the Atlas-Folder.
I read the titles for last 2 years and found nothing relevant. Be glad to read it if I knew what you suggest I read.
Have 30 MBits upload, and Cern connect with 1.5 MBits for the Moment under Atlas.
And this is ok. 3 Min. for 240 MByte!
How many ATLAS WUs are you uploading at once??? I'm trying to UL a couple hundred from the same IP.
By "ATLAS is optimized" do you mean the file size is as small as it humanly can be and it can never get smaller, or something else???
60) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40658)
Posted 25 Nov 2019 by Aurum
Post:
Yea, shortly after I said that I saw four 4c WUs that were 250 MB. No rhyme or reason.
Since the return files are so large the slow upload speed of ADSL connections is easily swamped.
I have to cut my ATLAS work in half if I stand any chance of clearing my upload logjam.
If CERN wants to maximize BOINC work then they should see if they can reduce the size of the largest return files.


Previous 20 · Next 20


©2024 CERN