Message boards :
ATLAS application :
Creation of container failed
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Since a few minutes ago i have the following error on ALL of my ATLAS native tasks: FATAL: container creation failed: hook function for tag prelayer returns error: failed to create /var/lib/condor directory: mkdir /var/lib/condor: permission denied Since this is just since today and on more than one machine, i don't think this a local problem!? I didn't change anything on my machines which were working flawlessly for months. Oh wait...there was an update for my OS today...unfortunately i do not remember the package which was updated. I hope this is not the problem... Anyone else with this problem? Any hints? Regards, djoser. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
Hi I have a different error with most of my recent Atlas native tasks, with a very interesting "validate the error" status : [2021-04-14 02:29:47] *** Error codes and diagnostics *** See that one for example. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I had 21 native ATLAS invalids yesterday, apparently due to segfaults. https://lhcathome.cern.ch/lhcathome/result.php?resultid=311786429 https://lhcathome.cern.ch/lhcathome/result.php?resultid=311785736 https://lhcathome.cern.ch/lhcathome/result.php?resultid=311783201 However, usually someone else was able to complete them validly, if they were running CentOS or Scientific Linux. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
A few hours ago the version of Singularity on CVMFS that is used to run ATLAS native tasks was updated from 3.2.1 to 3.7.2. This could explain the problem you see. Would be good to know if others are experiencing similar issues. This new version was thoroughly tested by ATLAS but here on LHC@Home there are a lot of different platforms and environments so it's possible it might cause a problem. |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
This would explain the sudden problem of creating the singularity container on my machines. I don't have singularity installed locally, so my machines use the CVMFS version, which was updated and don't seem to work with my machines. So i guess the simplest solution is to install singularity locally... Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
I have asked our singularity expert to have a look at your errors. I am not sure if the /var/lib/condor dir is significant - are you running condor on your machines? It could be that this dir exists in the image and has different permissions to your local dir. |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Thanks for looking into my problem. No, i'm not using condor. The directory /var/lib/condor doesn't even exist on my machines. On one of my machines i will try to run ATLAS native with a locally installed singularity, to see if this helps. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
On one of my machines i will try to run ATLAS native with a locally installed singularity, to see if this helps. No, unfortunately even with a locally installed singularity ATLAS native doesn't work. https://lhcathome.cern.ch/lhcathome/result.php?resultid=312206373 The error in the logfile shouldn't occur because the user is member of both groups "boinc" and "singularity" and therefore should have all permissions needed. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 677 |
This are the singularity-messages from your last Atlas-native successful tasks This is from your 1.PC: [2021-04-13 20:53:28] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img [2021-04-13 20:53:28] Checking for singularity binary... [2021-04-13 20:53:28] which: no singularity in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) [2021-04-13 20:53:28] Singularity is not installed, using version from CVMFS [2021-04-13 20:53:28] Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname [2021-04-13 20:53:32] [34mINFO: [0m Convert SIF file to sandbox... einstein [34mINFO: [0m Cleaning up image... [2021-04-13 20:53:32] Singularity works This is from your 2.PC [2021-04-14 15:17:40] Checking for singularity binary... [2021-04-14 15:17:40] which: no singularity in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) [2021-04-14 15:17:40] Singularity is not installed, using version from CVMFS [2021-04-14 15:17:40] Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname [2021-04-14 15:17:42] [34mINFO: [0m Convert SIF file to sandbox... hawking [34mINFO: [0m Cleaning up image... [2021-04-14 15:17:42] Singularity works Using only CentOS7 and CentOS8, but both with locally installed Singularity [2021-04-14 07:07:54] Running /usr/bin/singularity --version [2021-04-14 07:07:54] singularity version 3.7.1-1.el8 [2021-04-13 14:57:19] Running /usr/bin/singularity --version [2021-04-13 14:57:19] singularity version 3.4.0-1.2.el7 |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Yes, those are tasks from BEFORE the singularity update David mentioned. All tasks AFTER this update are failing. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 677 |
What mean this Info: Convert SIF file to sandbox..? |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
The latest task that failed for me first says [2021-04-15 05:40:32] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img and then (15mn after) [2021-04-15 05:54:39] *** Error codes and diagnostics *** Few time before another failed but the error log is completely different, and status is different ("error" when the previous is "validate the error") .[/quote] Before all this I had no problem with Atlas native. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 677 |
OK, we have to wait for the answer from David and the Singularity-Expert. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
This error usually means that the host ran out of memory. I have seen it often with vbox tasks when the VM was not given enough memory, but not in native tasks. However your host has only 4GB of memory which is kind of on the limit for running ATLAS tasks. It could be the latest batch uses slightly more memory and you reached the limit. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
On one of my machines i will try to run ATLAS native with a locally installed singularity, to see if this helps. The feedback I got said there was a configuration change related to setuid in the latest singularity, and he pointed me to this page. However from your last sentence it seems you have already done what is suggested there, so I'm not really sure what to do. |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
Thanks for the info, so I said "ok never mind let's remove atlas from the list", and then I realize that this machine is set to be on school location, and my school parameters precisely request not to run any Atlas ?? I have the checkbox "if no other work available..." ticked BUT there is always "other work" ?? (there is six-track and native theory selected and I have never seen it with no other tasks / forced to get some Atlas...) so I have untick to see if it changes, but I am surprised. (the machine is a small linux command line hosted VM and I won't even try to put VB on it, I don't even know if it's possible, and as you said it has very limited resources) |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
One thing you may try is adding some swap space to the VM. I remember some time ago testing ATLAS tasks on a similar 4GB VM, and they failed with this same error. Adding 1GB of swap space was enough to fix it, and the swap was only used very briefly at the start of the task. |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
I have done this (added a 10 GB swap file) and enabled Atlas again, I'll let you know. |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
On one of my machines i will try to run ATLAS native with a locally installed singularity, to see if this helps. That was the one machine i equipped with a local installation of singularity, and yes, i have done what the readme file was suggesting. Strange, that it doesn't work anyway... But what about my initial problem on the machine without locally installed singularity? I tried one more task today and still have the same problem. https://lhcathome.cern.ch/lhcathome/result.php?resultid=312621601 Had the singularity expert something to say about that particular problem? Thanks and regards! Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 20 Aug 10 Posts: 4 Credit: 2,216,637 RAC: 0 |
Hi, I'm having exactly the same issue as djoser. All my ATLAS Job go on error since 14/04. With the same error (not the same folder) : 2021-04-18 11:27:14] FATAL: container creation failed: hook function for tag prelayer returns error: failed to create /var/lib/alternatives directory: mkdir /var/lib/alternatives: permission denied For exemple, one of my last tasks : https://lhcathome.cern.ch/lhcathome/result.php?resultid=313045776 Regards |
©2024 CERN