Message boards : ATLAS application : ATLAS issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
cIsCo

Send message
Joined: 30 Aug 18
Posts: 3
Credit: 1,002
RAC: 0
Message 36707 - Posted: 14 Sep 2018, 15:21:57 UTC - in response to Message 36706.  

To add more details to my query:

HITS file got generated, so I assume that some WUs did get processed properly.

2018-09-14 16:48:41 (6112): Guest Log: output list
2018-09-14 16:48:41 (6112): Guest Log: HITS.15285626._010961.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/33/61/HITS.15285626._010961.pool.root.1:checksumtype=adler32:checksumvalue=b20c09b5
2018-09-14 16:48:41 (6112): Guest Log: log.15285626._010961.job.log.tgz.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/34/c4/log.15285626._010961.job.log.tgz.1:checksumtype=adler32:checksumvalue=bcdb22a5
2018-09-14 16:48:41 (6112): Guest Log: HITS file was successfully produced
2018-09-14 16:48:41 (6112): Guest Log: -rw------- 1 atlas01 atlas01 138252547 Sep 14 16:37 /home/atlas01/RunAtlas/HITS.15285626._010961.pool.root.1


There were some errors when the program had to be suspended as the laptop had to be powered off, but the VM always resumed later on, and went on processing the stuff, so I assume that nothing got corrupted.

If I could really get an idea on how to avoid this, that would be great, else no point in running these jobs for hours only to get failures at the end always.
ID: 36707 · Report as offensive     Reply Quote
cIsCo

Send message
Joined: 30 Aug 18
Posts: 3
Credit: 1,002
RAC: 0
Message 36709 - Posted: 14 Sep 2018, 16:00:44 UTC - in response to Message 36707.  

Did some further lookup. Panda ID is - 4045385593 for task 206483510 for WU 101296936. The Panda page and WU details page states that the job ended successfully at 2018-09-12T01:19:11. At my end the job finished 2 days later, that means this was also running at someone else system.

1) Why will the same job be sent out to multiple computers at the same time?
2) Why did the client at my end not pick the information that the job has already finished successfully, and it should stop its own processing?

I don't have complete understanding, hence various questions coming to my mind, which may be probably stupid, or the answers are already out there somewhere and I have not been able to find them yet. Thanks.
ID: 36709 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36710 - Posted: 14 Sep 2018, 16:42:31 UTC - in response to Message 36707.  

There were some errors when the program had to be suspended as the laptop had to be powered off, but the VM always resumed later on, and went on processing the stuff, so I assume that nothing got corrupted.


If I could really get an idea on how to avoid this, that would be great,

ATLAS tasks don't like being interrupted.
Don't interrupt ATLAS tasks.
else no point in running these jobs for hours only to get failures at the end always.

Precisely.
ID: 36710 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 107
Credit: 511,942
RAC: 0
Message 37087 - Posted: 24 Oct 2018, 12:37:45 UTC
Last modified: 24 Oct 2018, 12:39:30 UTC

I am trying to get ATLAS native working but I am running into a lot of problems.
A good deal of the issues I think I have sorted out (maybe), but I can't compile CVMFS at all.

The OP said there was a file that will install "everything" needed (both CVMFS and Singularity) however this is not the case and nothing really got installed at all.

So I went to Singularity web site and downloaded then installed this programme, and that seemed to have worked.

Then I went to the CVMFS site and as I did not really know what files to download I downloed a number and installed them 1x1, to see if they worked. They didn't.
I found the CVMFS Package and this mostly installed what was needed but still did not compile.

So I then found the link to the post by Gyllic on how to install from source and I have followed that (I now have a lot of replication but that is not a real issue).
Everything goes as planned till I need to use 'cmake' in the 'build' directory and it all fails with the following error

"CMake Error at CMakeLists.txt:10 (project)
No CMAKE_CXX_COMPILER could be found

Tell CMAKE where to find the compiler by setting either the environment variable "CXX" or CMake cache entry
CMAKE_CXX_COMPILER to the full path to the compiler, or the compiler name if it is in the PATH"

CMAKE_CXX_COMPILER is in a sub-directory of the build directory, so I don't know why it can't be found.
The installation was as directed (both from the site and from Gryllic).

My knowledge of Linux commands is limited, so finding PATHs is not easy, but I don't see why I need to if all has been extracted to the relevant directory, why has it placed the items in the directory where they can't be found?

I can't progress past cmake to make and install the CVMFS programme.

Any help would be appreciated as I have spent 2 days on this already.

I am installing this on a Fedora 25 system, (if I get it working I was going to install on a Fedora 21 system as well)

(PS this was posted in the news section first, about ATLAS native, then reposted here)

Thanks
Conan
ID: 37087 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 201
Credit: 2,500,279
RAC: 651
Message 37096 - Posted: 25 Oct 2018, 8:27:30 UTC - in response to Message 37087.  
Last modified: 25 Oct 2018, 8:43:11 UTC

I found the CVMFS Package and this mostly installed what was needed but still did not compile.
Which cvmfs package do you mean? If you already have cvmfs installed from a package, there is no need to compile it any more (you can try to type the command "cvmfs2 --version" into the terminal and see if you get any output). You can download rpm packages from here: https://cernvm.cern.ch/portal/filesystem/downloads. Since you are on Fedora 25, I don't know if the packages that are provided for Fedora 27 and 28 will work for you. You can also add the yum repositories and try to install from them.

This may help you:
https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html

"CMake Error at CMakeLists.txt:10 (project)
No CMAKE_CXX_COMPILER could be found

Tell CMAKE where to find the compiler by setting either the environment variable "CXX" or CMake cache entry
CMAKE_CXX_COMPILER to the full path to the compiler, or the compiler name if it is in the PATH"
Have you installed all packages needed to build cvmfs (i.e. cmake, gcc, etc.)? Since the guide is for Debian systems, the packages you have to install in order to compile cvmfs are maybe called differently on Fedora and maybe you have to install more than listed in the guide.
ID: 37096 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 107
Credit: 511,942
RAC: 0
Message 37098 - Posted: 26 Oct 2018, 10:08:29 UTC - in response to Message 37096.  
Last modified: 26 Oct 2018, 10:15:01 UTC

I found the CVMFS Package and this mostly installed what was needed but still did not compile.
Which cvmfs package do you mean? If you already have cvmfs installed from a package, there is no need to compile it any more (you can try to type the command "cvmfs2 --version" into the terminal and see if you get any output). You can download rpm packages from here: https://cernvm.cern.ch/portal/filesystem/downloads. Since you are on Fedora 25, I don't know if the packages that are provided for Fedora 27 and 28 will work for you. You can also add the yum repositories and try to install from them.

This may help you:
https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html

"CMake Error at CMakeLists.txt:10 (project)
No CMAKE_CXX_COMPILER could be found

Tell CMAKE where to find the compiler by setting either the environment variable "CXX" or CMake cache entry
CMAKE_CXX_COMPILER to the full path to the compiler, or the compiler name if it is in the PATH"
Have you installed all packages needed to build cvmfs (i.e. cmake, gcc, etc.)? Since the guide is for Debian systems, the packages you have to install in order to compile cvmfs are maybe called differently on Fedora and maybe you have to install more than listed in the guide.


Thanks for the reply gyllic,

I got the package from the CVMFS site as a pkg file first, which seemed to install but not properly and then I found out that this is most likely an OSX package. Then I found a tar file on the site so tried that, yet again it falls short of fully installing CVMFS and seems to put things in the wrong folders. I did manage to get at least one more step closer to instalation only to find that when I did a check on the install, I have at least 10 errors and a warning that all is not well.
I also tried to install on another system which is clean (a Fedora 21 system) but it came up with REPO failures and can't connect to any mirrors either, so I have just given up for the time being.
Another day wasted, making 3 now and still no closer to getting this bloody project running, GRRRR, very frustrating.
I have now done a complete DNF (YUM) clean all and update, which fixed up over 1,000 packages, so I can start afresh in a day or two.

Really it shouldn't be this darn hard to get a single programme running, all the hoops I am trying to jump through is just leaving me very unhappy and tired, not enjoying this science exercise at all.

Thanks for any assistance you have given but I will maybe try another day.

Conan
ID: 37098 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 2,832,660
RAC: 2,238
Message 37099 - Posted: 26 Oct 2018, 18:16:29 UTC - in response to Message 36706.  

I have been experiencing the same error's since early Sept on all LHC Atlas task, no matter what computer I process the work on. Can some look into this. My id is cphipps
ID: 37099 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 201
Credit: 2,500,279
RAC: 651
Message 37100 - Posted: 26 Oct 2018, 23:55:36 UTC - in response to Message 37098.  
Last modified: 27 Oct 2018, 0:04:15 UTC

Hi conan,

with the things you described, probably the easiest way to get cvmfs running for you is to build it from sources.
To do so on fedora 25 workstation, follow these steps:

1. login with your default user (that has sudo rights) and clean dnf:
sudo dnf clean all
2. Install all packages that are required to build cvmfs on fedora 25 as well as other packages like nano (I'm not sure if all following packages are needed, but to simplify things just install all of them)
sudo dnf install -y cmake make automake gcc gcc-c++ kernel-devel autofs fuse fuse-devel python-devel libcap-devel git attr valgrind-devel sqlite-devel libuuid-devel uuid-devel tar patch bzip2 zlib-devel openssl-devel unzip nano
3. continue with step 3 in the "build and install cvmfs" section from this guide https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840&postid=36880#36880, i.e.:
cd
mkdir cvmfs_source
git clone https://github.com/cvmfs/cvmfs.git cvmfs_source
...


Hopefully this solves your porblem. On my minimal installation of fedora 25 workstation cvmfs compiled and worked after the additional set up steps that are described in the guide.
ID: 37100 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 107
Credit: 511,942
RAC: 0
Message 37135 - Posted: 30 Oct 2018, 14:02:42 UTC - in response to Message 37100.  

Hi conan,

with the things you described, probably the easiest way to get cvmfs running for you is to build it from sources.
To do so on fedora 25 workstation, follow these steps:

1. login with your default user (that has sudo rights) and clean dnf:
sudo dnf clean all
2. Install all packages that are required to build cvmfs on fedora 25 as well as other packages like nano (I'm not sure if all following packages are needed, but to simplify things just install all of them)
sudo dnf install -y cmake make automake gcc gcc-c++ kernel-devel autofs fuse fuse-devel python-devel libcap-devel git attr valgrind-devel sqlite-devel libuuid-devel uuid-devel tar patch bzip2 zlib-devel openssl-devel unzip nano
3. continue with step 3 in the "build and install cvmfs" section from this guide https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840&postid=36880#36880, i.e.:
cd
mkdir cvmfs_source
git clone https://github.com/cvmfs/cvmfs.git cvmfs_source
...


Hopefully this solves your porblem. On my minimal installation of fedora 25 workstation cvmfs compiled and worked after the additional set up steps that are described in the guide.


G'Day gyllic,
Thanks for responding, but all your help and Google's and over 5 days of my own input have failed to get this bloody project running.

Every time I move forward one step, I get knocked back two steps.

I found that in /var/cache/dnf there is a file called "expired_repos.json", and it seems the main entry that is keeps getting passed to this file is ["cvmfs"].
This seems to cause me to get an error in terminal saying "can't syncronize repos".

After I deleted that 'dnf' folder (it gets recreated) I was able to actually make and install CVMFS, Singularity and Golang.
It all seemed to go OK. I then found v3 of Singularity and installed that as well (up from v2.5.1).
Testing Singularity --version, produces the version number so I thought all was now OK.

Boy was I wrong.
Doing 'cvmfs_config setup' and 'chksetup' seemed to work (at least at first), but 'cvmfs_config probe' fails every time.
Found that the label for /scratch/cvmfs/ is set wrong, but no matter what I do and try I can get the label to change (from 'default_t' to 'cvmfs_cache_t'), I keep getting the error that I have used an "Invalid argument".
I have checked this over and over and I am using the chcon command correctly but it keeps failing.

CVMFS does not seem to start and this is borne out but my failed work units, which say
"no cvmfs_config in /usr/local/bin"
"Can't find cvmfs_config does not exist"
"No CVMFS"

The cvmfs_config file is where it looks in /usr/local/bin/ but I have come to the conclusion that because CVMFS is not running then cvmfs_config can't then be located.

When I do 'cvmfs_config probe' the result is
"Probing /cvmfs/atlas.cern.ch... Failed"
"Probing /cvmfs/atlas-condb.cern.ch... Failed"
"Probing /cvmfs/grid.cern.ch... Failed"

When I look in the folder /cvmfs/ there is nothing in it.
I found a tutorial about this on the net and it said to try and access file whilst in the directory and this should make the files appear. Well they didn't.

I have followed your guide, which is very good. I have followed the guides direct from the project sites (CVMFS, Golang, Sigularity) and a few others that I found, all saying similar things but none of them have gotten me working.
All the troubleshooting I have done has just made my head spin.
I couldn't tell you how many things I have downloaded, installed , tried to install, tried to run, start and probe, check, then check again and again, move and copy.

I have had 3 screens up, one for the computer I am working on, one showing the error messages on the work units and my laptop showing me my search results in Google.

It has so far been to no avail. I am going to have a break at trying to get this going.
I may not even bother to try again as this has just been a nightmare.

It would be so much easier if there was just a few single files to be downloaded and installed. I still don't know what the correct file or files is from the CVMFS project, I think I ended up downloading all of them but it has not helped.

SO it looks like this volunteer just wont be helping ATLAS do anything. I can at least run Sixtrack I suppose.

Conan
ID: 37135 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 201
Credit: 2,500,279
RAC: 651
Message 37136 - Posted: 30 Oct 2018, 18:13:26 UTC - in response to Message 37135.  
Last modified: 30 Oct 2018, 18:24:32 UTC

Thanks for responding, but all your help and Google's and over 5 days of my own input have failed to get this bloody project running.
I'm sorry that getting this native ATLAS app up running is such a pain in the ass for you. Maybe it should be mentioned that the native app is still in beta, so things may be harder to setup compared with other applications.
It is a little bit weird because I have tested it and basically copy pasted all the commands from the previous post and from the guide, and it all worked well.

I found that in /var/cache/dnf there is a file called "expired_repos.json", and it seems the main entry that is keeps getting passed to this file is ["cvmfs"].
This seems to cause me to get an error in terminal saying "can't syncronize repos".
After I deleted that 'dnf' folder (it gets recreated) I was able to actually make and install CVMFS, Singularity and Golang.
If you added a dnf/yum repository earlier that does not work or you don't need it anymore, you can try to disable it with "sudo dnf config-manager --set-disabled repository" where "repository" is the name of the corresponding repository.

Testing Singularity --version, produces the version number so I thought all was now OK.
This sounds good, although (as we learned just a couple of days/weeks ago) getting just the output is not sufficient in order to determine if the ATLAS tasks will work. But as you already said, for now cvmfs is the reason why your tasks are not successfull.

Doing 'cvmfs_config setup' and 'chksetup' seemed to work (at least at first), but 'cvmfs_config probe' fails every time.
As long as the probing fails, all native ATLAS tasks will also fail. What is the output of the chksetup command?

Found that the label for /scratch/cvmfs/ is set wrong, but no matter what I do and try I can get the label to change (from 'default_t' to 'cvmfs_cache_t'), I keep getting the error that I have used an "Invalid argument".
I have checked this over and over and I am using the chcon command correctly but it keeps failing.
Not sure what you mean with that. If you followed the guide, then "/scratch/cvmfs/" will be the directory where cvmfs will place/search for the local cache. You have to create that folder manually first, and you can choose a different location if you want.
To simplify things for the moment, I recomend you to remove the lines "CVMFS_CACHE_BASE=/scratch/cvmfs" and "CVMFS_QUOTA_LIMIT=4096" from your "/etc/cvmfs/default.local" file. You probably will have to execute "sudo cvmfs_config setup" or "sudo cvmfs_config reload" again.

CVMFS does not seem to start and this is borne out but my failed work units, which say
"no cvmfs_config in /usr/local/bin"
"Can't find cvmfs_config does not exist"
"No CVMFS"
I'm not sure why it says that, maybe because the probing fails. If you type "cvmfs_config --help" in the terminal, do you get an output?

When I do 'cvmfs_config probe' the result is
"Probing /cvmfs/atlas.cern.ch... Failed"
"Probing /cvmfs/atlas-condb.cern.ch... Failed"
"Probing /cvmfs/grid.cern.ch... Failed"
Obviously this should not be the case. Have you tried to run the command "sudo service autofs restart" i.e. the fedora equivalent command and then tried to probe again?

When I look in the folder /cvmfs/ there is nothing in it.
Which directory do you mean? "/etc/cvmfs"?

So it looks like this volunteer just wont be helping ATLAS do anything. I can at least run Sixtrack I suppose.
You can still try to run the virtualbox based ATLAS app or the other virtualbox applications (Theory and LHCb at the moment). For that, yeti has written a very nice checklist which you will find on the message boards.

If you want you can send me a PM and I will look at your computer through teamviewer and see if we can get it up running (but i am no fedora expert).
ID: 37136 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 588
Credit: 3,594,033
RAC: 1,586
Message 37407 - Posted: 23 Nov 2018, 6:13:26 UTC

I completed my first Atlas task on the new Ryzen 5 1400 CPU. It used 4 cores and 6 GB RAM of the 8 available despite the presence of 5 Einstein@home tasks.
Tullio
ID: 37407 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,366,054
RAC: 39,219
Message 37408 - Posted: 23 Nov 2018, 6:46:22 UTC - in response to Message 37407.  

:-)). Let your Ryzen crunshing for a long time.
ID: 37408 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 588
Credit: 3,594,033
RAC: 1,586
Message 37422 - Posted: 23 Nov 2018, 21:37:24 UTC
Last modified: 23 Nov 2018, 21:38:05 UTC

After finally enabling AMD-V con my new Ryzen 5 1400 CPU I downloaded 2 Theory Simulation tasks and several Atlas tasks, One Theory task started on 8 cores (the Ryzen has 4) bu I soon noticedt, via VirtualBox Manager, that it was not doing anything so I aborted it. This was the standard behavior of my former Windows 10 PC with its A10-6700 CPU, but I had hoped that the new CPU would run Condor. It does not. All Atlas tasks run well and produce HITS files.
Tullio
ID: 37422 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 588
Credit: 3,594,033
RAC: 1,586
Message 37425 - Posted: 24 Nov 2018, 8:53:19 UTC
Last modified: 24 Nov 2018, 8:53:52 UTC

No issues with Atlas, it runs on 4 cores on my new Ryzen CPU and on one core on the old Opteron 1210 of the SUN workstation running Linux. Same for the SixTrack tasks which run also on the AMD E-450 of the Linux laptop.The only problem is with Theory tasks.
Tullio
ID: 37425 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 588
Credit: 3,594,033
RAC: 1,586
Message 37732 - Posted: 10 Jan 2019, 10:15:12 UTC

I get a number of computation errors after the HITS file has been produced. Here is the message
<message>
upload failure: <file_xfer_error>
<file_name>hXELDmoyN1tnyYickojUe11pABFKDmABFKDmQe8SDmABFKDm4XIGHm_1_r1914038861_ATLAS_result</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>
Tullio
ID: 37732 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,366,054
RAC: 39,219
Message 37947 - Posted: 7 Feb 2019, 6:46:45 UTC

2019-02-07 07:36:53 (7064): Guest Log: mv: cannot stat `metadata-*.xml': No such file or directory
2019-02-07 07:36:53 (7064): Guest Log: ERROR: Missing metadata.xml
2019-02-07 07:36:53 (7064): Guest Log: Listing of results directory
2019-02-07 07:36:53 (7064): Guest Log: total 372432
ID: 37947 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,674,412
RAC: 104,837
Message 37948 - Posted: 7 Feb 2019, 7:07:15 UTC

I notice an increasing number of invalid ATLAS native on different hosts.
Example:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=215546270

The logs show this lines:
*****************The last 100 lines of the pilot log******************
2019-02-07 05:56:10|3360|aria2cSiteMo| WARNING: Rucio python modules not available
2019-02-07 05:56:10|3360|EventRangesP| pp: unable to import module 'requests', which is necessary if panda proxy is used
ID: 37948 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 33
Credit: 778,722
RAC: 25
Message 37949 - Posted: 7 Feb 2019, 7:19:03 UTC

Three validate errors during last few hours here (VBox version). There had been a good period for a day or so until this again.
ID: 37949 · Report as offensive     Reply Quote
Guiri-One[Andalucia]

Send message
Joined: 1 Feb 06
Posts: 43
Credit: 9,723
RAC: 0
Message 38048 - Posted: 22 Feb 2019, 7:43:39 UTC
Last modified: 22 Feb 2019, 7:43:52 UTC

Will anyone explain why after hours of CPU usage...Atlas task is wasted?

https://lhcathome.cern.ch/lhcathome/result.php?resultid=216826770

I am starting to feel very frustated...
ID: 38048 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 739
Credit: 6,027,121
RAC: 1,035
Message 38049 - Posted: 22 Feb 2019, 8:04:34 UTC - in response to Message 38048.  

Will anyone explain why after hours of CPU usage...Atlas task is wasted?

https://lhcathome.cern.ch/lhcathome/result.php?resultid=216826770

I am starting to feel very frustated...

ATLAS doesn't like long suspends over several days. It's best to process the task in one uninterrupted run.
ID: 38049 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : ATLAS application : ATLAS issues


©2019 CERN