1) Questions and Answers : Unix/Linux : ATLAS (native_mt) Python incompatibility on Linux (Message 34982)
Posted 13 Apr 2018 by AGLT2
Post:
This is fixed.. It should use python2 instead of python3 now.
2) Message boards : ATLAS application : No credits for perfectly good workunits (Message 33763)
Posted 10 Jan 2018 by AGLT2
Post:
Hi, Djoser

Thanks for reporting the problem.. Yes we replicate the workunit if the first instance of the workunit does not return in 3 days, but we do give all finished instances credit if they pass the validation and return within the deadline(7 days)..
I check the both workunits you gave, actually they failed at validating due to our storage problems.. Unfortunately, since the end of December, the ATLAS BOINC storage area is saturated, so a big fraction of workunits fail at uploading or being validated due to this(the vaildation fails because the uploaded result file become inaccessible to the validator).. We are working on migrating to a different storage solution, this shall take some time..

Hello,

I have two workunits which my machine finished in time (regarding the deadline) and without any error.
But i didn't get any credits for my work, just because those workunits were handed out a second time (to machines running the native_mt app) and those machines finished the workunits faster than my computer.

This is very frustrating, because my computer only has a very tiny cpu and wu's take quite some time.
In this particular case i wasted about 100 hours of cpu-time, or about 0,3€ in electricity (1 kW).

I'm okay with wu's being canceled by server because another computer was faster as long as the wu's are in the queue and not currently being crunched, but i'm not okay with wasting resources.
Why are wu's handed out a second time in the first place as long as they doesn't error out?

Workunits: 83133808 and 83264910

I think i will stop crunching Atlas and head over to CMS and LHCb!

Regards, djoser.
3) Message boards : ATLAS application : ATLAS native app (Message 32273)
Posted 5 Sep 2017 by AGLT2
Post:
Based on this, we released a new version v2.51, so you do not need to hack the wrapper to run it on ubuntu..




14,16c14,16
<   os.system("cvmfs_config probe 1>&2>/dev/null")
<   ret1=os.system("cvmfs_config stat atlas.cern.ch 1>&2>/dev/null")
<   ret2=os.system("cvmfs_config stat atlas-condb.cern.ch 1>&2>/dev/null")
---
>   os.system("cvmfs_config probe")
>   ret1=os.system("cvmfs_config stat atlas.cern.ch")
>   ret2=os.system("cvmfs_config stat atlas-condb.cern.ch")
28c28
<   ret=os.system("singularity --version 1>&2>/dev/null")
---
>   ret=os.system("singularity --version")
132,135c132,134
<   if int(THREADS)!=1:
<     prefix="export ATHENA_PROC_NUMBER=%s;"%THREADS
<     sys.stderr.write(prefix)
<     os.system("sed -i -e '/set -x/a\%s' start_atlas.sh"%prefix)
---
>   prefix="export ATHENA_PROC_NUMBER=%s;"%THREADS
>   sys.stderr.write(prefix)
>   os.system("sed -i -e '/set -x/a\%s' start_atlas.sh"%prefix)


.
4) Message boards : ATLAS application : No tasks are available for ATLAS Simulation (Message 32239)
Posted 5 Sep 2017 by AGLT2
Post:
ATLAS jobs recently adds the "priority" feature, and with the old configuration on the lhc@home server, only the very reliable hosts can get these priority jobs, that is why for the past a few days, some volunteer hosts could not get any task even though there are unsent (high priority) jobs. We reconfigured the server, so now volunteer host can all get tasks!
5) Message boards : News : Deadline change for ATLAS jobs (Message 31961)
Posted 16 Aug 2017 by AGLT2
Post:
Due to the tight deadline of the ATLAS tasks, we change to deadline of ATLAS jobs from 2 weeks to 1 week. The ATLAS job takes about 3-4 CPU hours to finish on a moderate CPU (2.5GFLOPS).
6) Message boards : News : New ATLAS app version released for Linux hosts (Message 31914)
Posted 11 Aug 2017 by AGLT2
Post:

Good point. The native app is only shown on atlasathome.cern.ch and not on lhcathome.cern.ch or the dev site.


Actually it is on lhcathome.cern.ch,as I mentioned early, it is set to be beta version, so you need to enable "test app" in the lhcathome preference in your account, otherwise, you do not receive jobs from the native app..
7) Message boards : News : New ATLAS app version released for Linux hosts (Message 31906)
Posted 10 Aug 2017 by AGLT2
Post:


1. Why is it hosted on the old ATLAS server and not on the dev server?
2. Is it based on the version David Cameron has tested a few months ago?
3. Why is it restricted to the mentioned distributions? Simply not tested on others or due to some specific requirements?


1. Only the script file for installing cvmfs/singularity is hosted on the old ATLAS server, the app is still on the lhcathome server.
2. Yes, it is based on the native version David tested a few months ago
3. So far we only tested it on SLC6 and CentOS 7, it might work on other Linux, depending on if one can successfully install the CVMFS and Singularity there.
So please feel free to try, and we would appreciate the feedback on that.

Cheers!
8) Message boards : News : New ATLAS app version released for Linux hosts (Message 31904)
Posted 10 Aug 2017 by AGLT2
Post:
We released a new version of the ATLAS app today, 2.41 for the x86_64-pc-linux-gnu platform.
The new features of this version include:
1. It requires the host OS to be either Scentific Linux 6 or Cent OS 7.
2. It require CVMFS and Singularity instead of Virtualbox to run the ATLAS jobs.
3. It is more efficient, as the avoidance of using Virtualbox.
Currently, this version is set to beta version.

For people who want to try it out,we provide a script to install everything including CVMFS, singularity here,


Try it if you are interested!
9) Message boards : ATLAS application : Download failures (Message 31783)
Posted 2 Aug 2017 by AGLT2
Post:
Just to summarize the cause of the failure for some files with ATLAS@home:
1. some of the input files are stored on a test server boincai04, and it was down yesterday due to heavy load. We modified the job submission script, so all the input files are stored on more powerful and reliable servers, which should prevent this from happening again.
2. For people who still attach the hosts to the Atlas-test project, the server (boincai04) was stuck a few times in the past a few days due to the heavy workload on it as a tiny machine. Now we split the workload on different machines, and the boincai04 machine still dispatches a small amount of test jobs..

Cheers!
10) Message boards : ATLAS application : Download failures (Message 31781)
Posted 2 Aug 2017 by AGLT2
Post:
Yes, there was a permission issue with the scheduler file on boincai04, it is fixed now..

Cheers!

Hi,
The server which is hosting this file was down, that is why there was a download error.. Now we have brought back the machine, and the file should be available.

Cheers!

Ever since the day that there was the issue after the cleanup of old files, I have been experiencing the same issues. LHC was running Atlas fine for a long time, where it would download 4 tasks and run them through without issue.. What I am seeing now (and since the day of the server cleanup) is my machine will attempt to download files for tasks, and get stuck retrying on a few for several hours.

7/29/2017 7:22:43 PM | LHC@home | Started download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e
7/29/2017 7:23:05 PM | | Project communication failed: attempting access to reference site
7/29/2017 7:23:05 PM | LHC@home | Temporarily failed download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e: connect() failed
7/29/2017 7:23:05 PM | LHC@home | Backing off 01:05:41 on download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e
7/29/2017 7:23:06 PM | | Internet access OK - project servers may be temporarily down.

I have Updated, Restarted, Removed and re-added the LHC project several times, over several days. Suggestions? Tried 2 different networks (home and work), same issues. Connection to other projects are no issue.


Server error: feeder not running

Server can't open log file (../log_boincai04/scheduler.log)


(did you happen to check your pm over at TEST lately? )
11) Message boards : ATLAS application : Download failures (Message 31762)
Posted 1 Aug 2017 by AGLT2
Post:
Hi,
The server which is hosting this file was down, that is why there was a download error.. Now we have brought back the machine, and the file should be available.

Cheers!

Ever since the day that there was the issue after the cleanup of old files, I have been experiencing the same issues. LHC was running Atlas fine for a long time, where it would download 4 tasks and run them through without issue.. What I am seeing now (and since the day of the server cleanup) is my machine will attempt to download files for tasks, and get stuck retrying on a few for several hours.

7/29/2017 7:22:43 PM | LHC@home | Started download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e
7/29/2017 7:23:05 PM | | Project communication failed: attempting access to reference site
7/29/2017 7:23:05 PM | LHC@home | Temporarily failed download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e: connect() failed
7/29/2017 7:23:05 PM | LHC@home | Backing off 01:05:41 on download of jf_f3ff3ac08153d0ee04ea606f0dea9a0e
7/29/2017 7:23:06 PM | | Internet access OK - project servers may be temporarily down.

I have Updated, Restarted, Removed and re-added the LHC project several times, over several days. Suggestions? Tried 2 different networks (home and work), same issues. Connection to other projects are no issue.



©2023 CERN