Message boards :
ATLAS application :
error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 |
Hello, I am getting this error on Atlas native tasks like https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862538 or https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862442 I'm pretty sure I have everything installed correctly as tasks like https://lhcathome.cern.ch/lhcathome/result.php?resultid=256735752 finish successfully. I'm at a loss of what to do here. All machines running Boinc 7.9.3. Suspending Atlas on these machines for the time being. A quick search of the forum yielded CMS was not installed (it is). Any help appreciated here. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I am getting this error on Atlas native tasks like https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862538 or https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862442 Been there, done that. The only way I can get native ATLAS to run is by creating a second BOINC account (BOINC2). https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5169&postid=40851#40851 That is a bit of a pain if you have never done it before, but easy enough thereafter. It removes the "lock" on the files that is somehow preventing it from running. A bit of research on the stderr error message may be significant. "container creation failed: mount ->/var error: can't remount /var: operation not permitted" https://lhcathome.cern.ch/lhcathome/result.php?resultid=256777262 It seems to have something to do with how the local storage is mounted. https://github.com/sylabs/singularity/issues/2282 EDIT: Even after all that, you have to do the following: Attach to LHC and download native ATLAS Allow a native ATLAS to start up. Allow access by all (or ATLAS will fail): sudo chmod -R 777 /var/lib/boinc-client sudo chmod -R 777 /var/lib/boinc2 I am out at this point. The experts need to get back to work. |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,444,584 RAC: 123,615 |
The stderr.txt clearly states what is missing - python2: /usr/bin/env: ‘python2’: No such file or directory Python2 is still required for ATLAS native but not for Theory native. Hence this example succeeded: https://lhcathome.cern.ch/lhcathome/result.php?resultid=256735752 |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I have python, or it would not work at all. It seems to have problems accessing it. |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,444,584 RAC: 123,615 |
David Cameron is working on a solution to make python2 obsolete. See: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=507&postid=6893#6893 |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Good. I hope they can make it more like native Theory. That works. |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 |
Thank you for catching that. Apparently trying to read things at 2 in the morning is hard. I have installed Python 2.7, I have not set up a second Boinc instance however I have also done sudo chmod -R 777 /var/lib/boinc-client Hopefully everything is good from now on. I did not see that Python was a prerequisite for Atlas native either but I of course may have missed that. It will be fantastic when this is no longer needed - hopefully that comes out of dev soon. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Anyone have solution for this? 2020-01-16 14:28:22,074: Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname 2020-01-16 14:28:23,158: Singularity isnt working: [34mINFO: [0m Convert SIF file to sandbox... [31mFATAL: [0m while extracting /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img: root filesystem extraction failed: failed to copy content in staging file: write /tmp/archive-846395633: no space left on device Some times host getting trouble and this happen before and solve it self after a while. One of host happens to be effected more then other but setup is done same way for all host. This host use default value in config. Would it help to increase cache? Or any other parameter that could help to adjust?. Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=259466119 Device info storage Cache 46080 KB Swap space 4 GB Total disk space 410.56 GB Free Disk Space 393.62 GB Another host same issue: Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=258968040 Could give it a try to resize swap. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Are you using all 72 cores? I normally provide at least 2 GB/core when running native ATLAS on single cores each. So you may need more memory, if you are ever running only native ATLAS (or run 2 cores each would be the easy fix). |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,816,631 RAC: 127,244 |
Found this from github: https://github.com/sylabs/singularity/issues/2719 |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Yes i use all 72 cores. But half of them are used to Atlas, running sixtrack and other projects. At highest counted i gone up at 70 GB for this system and had an issue only when rosetta took 4GB for each task and when yoyo ecm P2. Right now when Atlas was running on this host it is around 50GB ram. I notice that /dev/cl/root was full and on default it was set to 50 GiB so right now i will update and restart and see if clear up some space. If not i try change size on this or reinstall os. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Thanks for looking up SIF. It looks like SIF read-only but would tell much for singularity to atlas. I had root full after trying to create new swapfile and after updates and reboot it stall. No rescue for my host so i did re-install of os and gave root and swap to higher amount with custom setup instead of default. So just hope it solve it and deal with all other hosts. |
Send message Joined: 14 Sep 08 Posts: 43 Credit: 50,599,809 RAC: 105,358 |
I am running into the same. Is this /var on host filesystem? I probably don't want singularity to remount my /var on host system, but if it's trying to mount due to some missing flags, I can probably check what they do and add them so that remount becomes a noop and succeeds. https://lhcathome.cern.ch/lhcathome/result.php?resultid=260003929 If I couldn't resolve this, is there a way to disable native atlas while allowing native theory without refusing atlas work entirely? |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
The BOINC data directory must be mounted inside the container, and with a default installation this is /var/lib/boinc-client/slots. If there are problems mounting /var you could try a different data directory or install BOINC in a different place. For example on my desktop I run boinc-client from my home directory because the root partition is too small. |
Send message Joined: 14 Sep 08 Posts: 43 Credit: 50,599,809 RAC: 105,358 |
The BOINC data directory must be mounted inside the container, and with a default installation this is /var/lib/boinc-client/slots. If there are problems mounting /var you could try a different data directory or install BOINC in a different place. For example on my desktop I run boinc-client from my home directory because the root partition is too small. Thanks for the reply. Looks like it's a bind mount and I should be able to easily reproduce this without wasting WUs. However, it does seem to work locally, assuming seeing the error message means container has been setup properly with remount. $ sudo su -l boinc -s /bin/bash -c '/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /var/lib/boinc-client/slots/32 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh ls' INFO: Convert SIF file to sandbox... /usr/bin/ls: /usr/bin/ls: cannot execute binary file INFO: Cleaning up image... Now i wonder if it's some setup in the default unit file came with Ubuntu 19.10: https://pastebin.com/akEe8cyY. I am not that familiar with systemd unit files, but nothing looks suspicious after searching the man page. Clearly the symlink /var/lib/boinc should have been resolved given all WUs read/write /var/lib/boinc-client/ without a problem. Any ideas where I should look next? |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Hello, i just noticed that i had 2 WUs with that error today: https://lhcathome.cern.ch/lhcathome/result.php?resultid=271962666 and https://lhcathome.cern.ch/lhcathome/result.php?resultid=271977499 The one running right now seems to have the same problem. Running, but low CPU usage. I know what this error means, but does anyone know why there are no sub tasks? Greetings, djoser. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,444,584 RAC: 123,615 |
No guarantee, but you may try if the following options in /etc/cvmfs/default.local solve the issue (or slow down your CVMFS). CVMFS_MAX_RETRIES=3 CVMFS_KCACHE_TIMEOUT=3 Run "cvmfs_config reload" after you saved the config file. |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
I set up a machine with Ubuntu 20.04 (ID: 10651915) which also fails to create the container due to permission problems. In the CVMFS setup procedure i noticed a message that the distribution is not supported and settings for 18.04 are being used. Maybe this is related to each other? BTW: The 'chmod -R 777 /var/lib/boinc-client' command did not resolve the issue for me. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Running 20.04 also and there is no packages for it. It would suggest 18.04 instead and it fails for me to. We would need to wait but some might be able to build or use nightly. Have not trying so and use virtualbox instead. Cern could we get focal added to dist? http://cvmrepo.web.cern.ch/cvmrepo/apt/dists/ |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
I already downgraded to 18.04. Now i'm running into serious problems with cvmfs, see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=273707970 Also all my jobs are running much longer and CPU efficiency dropped significantly, that's network related i'm almost sure (cvmfs connections to CERN). Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
©2024 CERN