Message boards :
ATLAS application :
ATLAS issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 30 Aug 18 Posts: 3 Credit: 1,002 RAC: 0 |
To add more details to my query: HITS file got generated, so I assume that some WUs did get processed properly. 2018-09-14 16:48:41 (6112): Guest Log: output list 2018-09-14 16:48:41 (6112): Guest Log: HITS.15285626._010961.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/33/61/HITS.15285626._010961.pool.root.1:checksumtype=adler32:checksumvalue=b20c09b5 2018-09-14 16:48:41 (6112): Guest Log: log.15285626._010961.job.log.tgz.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/34/c4/log.15285626._010961.job.log.tgz.1:checksumtype=adler32:checksumvalue=bcdb22a5 2018-09-14 16:48:41 (6112): Guest Log: HITS file was successfully produced 2018-09-14 16:48:41 (6112): Guest Log: -rw------- 1 atlas01 atlas01 138252547 Sep 14 16:37 /home/atlas01/RunAtlas/HITS.15285626._010961.pool.root.1 There were some errors when the program had to be suspended as the laptop had to be powered off, but the VM always resumed later on, and went on processing the stuff, so I assume that nothing got corrupted. If I could really get an idea on how to avoid this, that would be great, else no point in running these jobs for hours only to get failures at the end always. |
Send message Joined: 30 Aug 18 Posts: 3 Credit: 1,002 RAC: 0 |
Did some further lookup. Panda ID is - 4045385593 for task 206483510 for WU 101296936. The Panda page and WU details page states that the job ended successfully at 2018-09-12T01:19:11. At my end the job finished 2 days later, that means this was also running at someone else system. 1) Why will the same job be sent out to multiple computers at the same time? 2) Why did the client at my end not pick the information that the job has already finished successfully, and it should stop its own processing? I don't have complete understanding, hence various questions coming to my mind, which may be probably stupid, or the answers are already out there somewhere and I have not been able to find them yet. Thanks. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
There were some errors when the program had to be suspended as the laptop had to be powered off, but the VM always resumed later on, and went on processing the stuff, so I assume that nothing got corrupted. ATLAS tasks don't like being interrupted. Don't interrupt ATLAS tasks. else no point in running these jobs for hours only to get failures at the end always. Precisely. |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
I am trying to get ATLAS native working but I am running into a lot of problems. A good deal of the issues I think I have sorted out (maybe), but I can't compile CVMFS at all. The OP said there was a file that will install "everything" needed (both CVMFS and Singularity) however this is not the case and nothing really got installed at all. So I went to Singularity web site and downloaded then installed this programme, and that seemed to have worked. Then I went to the CVMFS site and as I did not really know what files to download I downloed a number and installed them 1x1, to see if they worked. They didn't. I found the CVMFS Package and this mostly installed what was needed but still did not compile. So I then found the link to the post by Gyllic on how to install from source and I have followed that (I now have a lot of replication but that is not a real issue). Everything goes as planned till I need to use 'cmake' in the 'build' directory and it all fails with the following error "CMake Error at CMakeLists.txt:10 (project) No CMAKE_CXX_COMPILER could be found Tell CMAKE where to find the compiler by setting either the environment variable "CXX" or CMake cache entry CMAKE_CXX_COMPILER to the full path to the compiler, or the compiler name if it is in the PATH" CMAKE_CXX_COMPILER is in a sub-directory of the build directory, so I don't know why it can't be found. The installation was as directed (both from the site and from Gryllic). My knowledge of Linux commands is limited, so finding PATHs is not easy, but I don't see why I need to if all has been extracted to the relevant directory, why has it placed the items in the directory where they can't be found? I can't progress past cmake to make and install the CVMFS programme. Any help would be appreciated as I have spent 2 days on this already. I am installing this on a Fedora 25 system, (if I get it working I was going to install on a Fedora 21 system as well) (PS this was posted in the news section first, about ATLAS native, then reposted here) Thanks Conan |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
I found the CVMFS Package and this mostly installed what was needed but still did not compile.Which cvmfs package do you mean? If you already have cvmfs installed from a package, there is no need to compile it any more (you can try to type the command "cvmfs2 --version" into the terminal and see if you get any output). You can download rpm packages from here: https://cernvm.cern.ch/portal/filesystem/downloads. Since you are on Fedora 25, I don't know if the packages that are provided for Fedora 27 and 28 will work for you. You can also add the yum repositories and try to install from them. This may help you: https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html "CMake Error at CMakeLists.txt:10 (project)Have you installed all packages needed to build cvmfs (i.e. cmake, gcc, etc.)? Since the guide is for Debian systems, the packages you have to install in order to compile cvmfs are maybe called differently on Fedora and maybe you have to install more than listed in the guide. |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
I found the CVMFS Package and this mostly installed what was needed but still did not compile.Which cvmfs package do you mean? If you already have cvmfs installed from a package, there is no need to compile it any more (you can try to type the command "cvmfs2 --version" into the terminal and see if you get any output). You can download rpm packages from here: https://cernvm.cern.ch/portal/filesystem/downloads. Since you are on Fedora 25, I don't know if the packages that are provided for Fedora 27 and 28 will work for you. You can also add the yum repositories and try to install from them. Thanks for the reply gyllic, I got the package from the CVMFS site as a pkg file first, which seemed to install but not properly and then I found out that this is most likely an OSX package. Then I found a tar file on the site so tried that, yet again it falls short of fully installing CVMFS and seems to put things in the wrong folders. I did manage to get at least one more step closer to instalation only to find that when I did a check on the install, I have at least 10 errors and a warning that all is not well. I also tried to install on another system which is clean (a Fedora 21 system) but it came up with REPO failures and can't connect to any mirrors either, so I have just given up for the time being. Another day wasted, making 3 now and still no closer to getting this bloody project running, GRRRR, very frustrating. I have now done a complete DNF (YUM) clean all and update, which fixed up over 1,000 packages, so I can start afresh in a day or two. Really it shouldn't be this darn hard to get a single programme running, all the hoops I am trying to jump through is just leaving me very unhappy and tired, not enjoying this science exercise at all. Thanks for any assistance you have given but I will maybe try another day. Conan |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 7,749,239 RAC: 1,813 |
I have been experiencing the same error's since early Sept on all LHC Atlas task, no matter what computer I process the work on. Can some look into this. My id is cphipps |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Hi conan, with the things you described, probably the easiest way to get cvmfs running for you is to build it from sources. To do so on fedora 25 workstation, follow these steps: 1. login with your default user (that has sudo rights) and clean dnf: sudo dnf clean all2. Install all packages that are required to build cvmfs on fedora 25 as well as other packages like nano (I'm not sure if all following packages are needed, but to simplify things just install all of them) sudo dnf install -y cmake make automake gcc gcc-c++ kernel-devel autofs fuse fuse-devel python-devel libcap-devel git attr valgrind-devel sqlite-devel libuuid-devel uuid-devel tar patch bzip2 zlib-devel openssl-devel unzip nano3. continue with step 3 in the "build and install cvmfs" section from this guide https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840&postid=36880#36880, i.e.: cd mkdir cvmfs_source git clone https://github.com/cvmfs/cvmfs.git cvmfs_source ... Hopefully this solves your porblem. On my minimal installation of fedora 25 workstation cvmfs compiled and worked after the additional set up steps that are described in the guide. |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
Hi conan, G'Day gyllic, Thanks for responding, but all your help and Google's and over 5 days of my own input have failed to get this bloody project running. Every time I move forward one step, I get knocked back two steps. I found that in /var/cache/dnf there is a file called "expired_repos.json", and it seems the main entry that is keeps getting passed to this file is ["cvmfs"]. This seems to cause me to get an error in terminal saying "can't syncronize repos". After I deleted that 'dnf' folder (it gets recreated) I was able to actually make and install CVMFS, Singularity and Golang. It all seemed to go OK. I then found v3 of Singularity and installed that as well (up from v2.5.1). Testing Singularity --version, produces the version number so I thought all was now OK. Boy was I wrong. Doing 'cvmfs_config setup' and 'chksetup' seemed to work (at least at first), but 'cvmfs_config probe' fails every time. Found that the label for /scratch/cvmfs/ is set wrong, but no matter what I do and try I can get the label to change (from 'default_t' to 'cvmfs_cache_t'), I keep getting the error that I have used an "Invalid argument". I have checked this over and over and I am using the chcon command correctly but it keeps failing. CVMFS does not seem to start and this is borne out but my failed work units, which say "no cvmfs_config in /usr/local/bin" "Can't find cvmfs_config does not exist" "No CVMFS" The cvmfs_config file is where it looks in /usr/local/bin/ but I have come to the conclusion that because CVMFS is not running then cvmfs_config can't then be located. When I do 'cvmfs_config probe' the result is "Probing /cvmfs/atlas.cern.ch... Failed" "Probing /cvmfs/atlas-condb.cern.ch... Failed" "Probing /cvmfs/grid.cern.ch... Failed" When I look in the folder /cvmfs/ there is nothing in it. I found a tutorial about this on the net and it said to try and access file whilst in the directory and this should make the files appear. Well they didn't. I have followed your guide, which is very good. I have followed the guides direct from the project sites (CVMFS, Golang, Sigularity) and a few others that I found, all saying similar things but none of them have gotten me working. All the troubleshooting I have done has just made my head spin. I couldn't tell you how many things I have downloaded, installed , tried to install, tried to run, start and probe, check, then check again and again, move and copy. I have had 3 screens up, one for the computer I am working on, one showing the error messages on the work units and my laptop showing me my search results in Google. It has so far been to no avail. I am going to have a break at trying to get this going. I may not even bother to try again as this has just been a nightmare. It would be so much easier if there was just a few single files to be downloaded and installed. I still don't know what the correct file or files is from the CVMFS project, I think I ended up downloading all of them but it has not helped. SO it looks like this volunteer just wont be helping ATLAS do anything. I can at least run Sixtrack I suppose. Conan |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Thanks for responding, but all your help and Google's and over 5 days of my own input have failed to get this bloody project running.I'm sorry that getting this native ATLAS app up running is such a pain in the ass for you. Maybe it should be mentioned that the native app is still in beta, so things may be harder to setup compared with other applications. It is a little bit weird because I have tested it and basically copy pasted all the commands from the previous post and from the guide, and it all worked well. I found that in /var/cache/dnf there is a file called "expired_repos.json", and it seems the main entry that is keeps getting passed to this file is ["cvmfs"].If you added a dnf/yum repository earlier that does not work or you don't need it anymore, you can try to disable it with "sudo dnf config-manager --set-disabled repository" where "repository" is the name of the corresponding repository. Testing Singularity --version, produces the version number so I thought all was now OK.This sounds good, although (as we learned just a couple of days/weeks ago) getting just the output is not sufficient in order to determine if the ATLAS tasks will work. But as you already said, for now cvmfs is the reason why your tasks are not successfull. Doing 'cvmfs_config setup' and 'chksetup' seemed to work (at least at first), but 'cvmfs_config probe' fails every time.As long as the probing fails, all native ATLAS tasks will also fail. What is the output of the chksetup command? Found that the label for /scratch/cvmfs/ is set wrong, but no matter what I do and try I can get the label to change (from 'default_t' to 'cvmfs_cache_t'), I keep getting the error that I have used an "Invalid argument".Not sure what you mean with that. If you followed the guide, then "/scratch/cvmfs/" will be the directory where cvmfs will place/search for the local cache. You have to create that folder manually first, and you can choose a different location if you want. To simplify things for the moment, I recomend you to remove the lines "CVMFS_CACHE_BASE=/scratch/cvmfs" and "CVMFS_QUOTA_LIMIT=4096" from your "/etc/cvmfs/default.local" file. You probably will have to execute "sudo cvmfs_config setup" or "sudo cvmfs_config reload" again. CVMFS does not seem to start and this is borne out but my failed work units, which sayI'm not sure why it says that, maybe because the probing fails. If you type "cvmfs_config --help" in the terminal, do you get an output? When I do 'cvmfs_config probe' the result isObviously this should not be the case. Have you tried to run the command "sudo service autofs restart" i.e. the fedora equivalent command and then tried to probe again? When I look in the folder /cvmfs/ there is nothing in it.Which directory do you mean? "/etc/cvmfs"? So it looks like this volunteer just wont be helping ATLAS do anything. I can at least run Sixtrack I suppose.You can still try to run the virtualbox based ATLAS app or the other virtualbox applications (Theory and LHCb at the moment). For that, yeti has written a very nice checklist which you will find on the message boards. If you want you can send me a PM and I will look at your computer through teamviewer and see if we can get it up running (but i am no fedora expert). |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I completed my first Atlas task on the new Ryzen 5 1400 CPU. It used 4 cores and 6 GB RAM of the 8 available despite the presence of 5 Einstein@home tasks. Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 169 |
:-)). Let your Ryzen crunshing for a long time. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
After finally enabling AMD-V con my new Ryzen 5 1400 CPU I downloaded 2 Theory Simulation tasks and several Atlas tasks, One Theory task started on 8 cores (the Ryzen has 4) bu I soon noticedt, via VirtualBox Manager, that it was not doing anything so I aborted it. This was the standard behavior of my former Windows 10 PC with its A10-6700 CPU, but I had hoped that the new CPU would run Condor. It does not. All Atlas tasks run well and produce HITS files. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
No issues with Atlas, it runs on 4 cores on my new Ryzen CPU and on one core on the old Opteron 1210 of the SUN workstation running Linux. Same for the SixTrack tasks which run also on the AMD E-450 of the Linux laptop.The only problem is with Theory tasks. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I get a number of computation errors after the HITS file has been produced. Here is the message <message> upload failure: <file_xfer_error> <file_name>hXELDmoyN1tnyYickojUe11pABFKDmABFKDmQe8SDmABFKDm4XIGHm_1_r1914038861_ATLAS_result</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 169 |
2019-02-07 07:36:53 (7064): Guest Log: mv: cannot stat `metadata-*.xml': No such file or directory 2019-02-07 07:36:53 (7064): Guest Log: ERROR: Missing metadata.xml 2019-02-07 07:36:53 (7064): Guest Log: Listing of results directory 2019-02-07 07:36:53 (7064): Guest Log: total 372432 |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 12,857 |
I notice an increasing number of invalid ATLAS native on different hosts. Example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=215546270 The logs show this lines: *****************The last 100 lines of the pilot log****************** 2019-02-07 05:56:10|3360|aria2cSiteMo| WARNING: Rucio python modules not available 2019-02-07 05:56:10|3360|EventRangesP| pp: unable to import module 'requests', which is necessary if panda proxy is used |
Send message Joined: 26 Oct 18 Posts: 96 Credit: 4,188,598 RAC: 0 |
Three validate errors during last few hours here (VBox version). There had been a good period for a day or so until this again. |
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
Will anyone explain why after hours of CPU usage...Atlas task is wasted? https://lhcathome.cern.ch/lhcathome/result.php?resultid=216826770 I am starting to feel very frustated... |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 470 |
Will anyone explain why after hours of CPU usage...Atlas task is wasted? ATLAS doesn't like long suspends over several days. It's best to process the task in one uninterrupted run. |
©2025 CERN