Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 42533 - Posted: 18 May 2020, 9:23:36 UTC - in response to Message 42527.  

You could drop that it won't work. I got same as you and CERN need to build for 20.04 to make it work.
ID: 42533 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 42999 - Posted: 10 Jul 2020, 8:11:13 UTC
Last modified: 10 Jul 2020, 8:34:00 UTC

I have nothing but failing tasks since yesterday evening.

stderr-output:
process exited with code 195 (0xc3, -61)
Probing /cvmfs/atlas.cern.ch... Failed!
cvmfs_config probe failed, aborting the job

Furthermore i have 2 tasks which ran abnormally long and failed:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=279391027
https://lhcathome.cern.ch/lhcathome/result.php?resultid=279390878

The latter is still running, but will doubtlessly fail...
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 42999 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43000 - Posted: 10 Jul 2020, 8:33:08 UTC - in response to Message 42533.  

You could drop that it won't work. I got same as you and CERN need to build for 20.04 to make it work.

By the way: CVMFS for Focal (20.04) is available now!
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43000 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43002 - Posted: 10 Jul 2020, 10:16:30 UTC - in response to Message 42999.  

I have nothing but failing tasks since yesterday evening.

It seems that there was a glitch on my machine. After a reboot things are back to normal. All-clear.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43002 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43016 - Posted: 10 Jul 2020, 21:23:31 UTC

Great will try it.
ID: 43016 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43018 - Posted: 10 Jul 2020, 22:48:51 UTC - in response to Message 43016.  
Last modified: 10 Jul 2020, 22:49:09 UTC

[2020-07-11 00:32:55] Job failed
[2020-07-11 00:32:55] INFO:    Convert SIF file to sandbox...
[2020-07-11 00:32:55] INFO:    Cleaning up image...
[2020-07-11 00:32:55] FATAL:   container creation failed: mount ->/var error: can't remount /var: operation not permitted


Looks like permission issues on latest build or just 2.7.3.0

Did it work for you?
ID: 43018 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43024 - Posted: 11 Jul 2020, 8:06:58 UTC - in response to Message 43018.  

Hello Gunde,

Hmmm...this seems to be the same problem we already had. That's strange...

I have to admit that i haven't tried it yet, because i decided to stay with Lubuntu 18.04 on my main machine until support ends (April 2021), according to the principle "never touch a running system". On another machine i tried CentOS 8 instead of Ubuntu, just for testing and fun :-) It feels not very different from Ubuntu, because both use Gnome as standard desktop environment, but setting up BOINC on CentOS is a little bit more work at first.

Greetings!
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43024 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43026 - Posted: 11 Jul 2020, 8:32:59 UTC - in response to Message 43024.  

OK i will let you know if it got fixed if you like to change to focal later.
ID: 43026 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43029 - Posted: 11 Jul 2020, 8:56:44 UTC - in response to Message 43026.  
Last modified: 11 Jul 2020, 9:08:29 UTC

That would be awesome, thank you!
I only hope that CERN staff know about this permission problem with Focal...
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43029 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43034 - Posted: 12 Jul 2020, 1:16:19 UTC

Ok i have manage get it running and looks to be singularity all along. The singularity that is included does not work in focal or with version 2.7.3.0. I installed singularity 3.5.2 which worked well in ubuntu and centos and it started.

Before i tested it i made build from git and got cvmfs 2.8.0.0 instead of 2.7.3.0 which are pushed out for focal. So try it install singularity first and if that failed you could clone from git and get latest version.
ID: 43034 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,407
RAC: 103,245
Message 43035 - Posted: 12 Jul 2020, 4:26:03 UTC - in response to Message 43034.  

This is a Info from David Cameron from Dec.19 2019:
This host got an auto-update to singularity 3.5.1 a couple of days ago and since then tasks have been completing successfully. I suppose that 3.5.0 has some bugs. This is a good reason to use singularity from CVMFS since it is always validated to work with ATLAS tasks.
ID: 43035 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43036 - Posted: 12 Jul 2020, 9:07:11 UTC - in response to Message 43035.  
Last modified: 12 Jul 2020, 9:33:04 UTC

The thing is that we can't use singularity that follow with cvmfs in focal. So we would need to install singularity separately to avoid singularity included in cvmfs. Most versions solid work just try latest stable.

This is a good reason to use singularity from CVMFS since it is always validated to work with ATLAS tasks.


This is incorrect for host that use Ubuntu 20.04. permission issues on every task. It would post that singularity is ok but it is not.

[2020-07-11 23:56:12] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
[2020-07-11 23:56:12] Checking for singularity binary...
[2020-07-11 23:56:12] Singularity is not installed, using version from CVMFS
...........
[2020-07-11 23:56:23] Job failed
[2020-07-11 23:56:23] INFO:    Convert SIF file to sandbox...
[2020-07-11 23:56:23] INFO:    Cleaning up image...
[2020-07-11 23:56:23] FATAL:   container creation failed: mount ->/var error: can't remount /var: operation not permitted
[2020-07-11 23:56:23] ./runtime_log.err
[2020-07-11 23:56:23] ./runtime_log
00:06:24 (3073789): run_atlas exited; CPU time 12.010002
00:06:24 (3073789): app exit status: 0x1
00:06:24 (3073789): called boinc_finish(195)


This is for old cvmfs build for 18.04/19.04 cvmfs 2.7.x.x also latest from CernVM-FS Package Repositories for 20.04 cvmfs 2.7.3.0 also same error if clone from git and build with 2.8.0.0. All error out with same error line 0m container creation failed: mount ->/var error: can't remount /var: operation not permitted.

So for me i can only say that singularity that is included do not work for all systems.
ID: 43036 · Report as offensive     Reply Quote
Sesson

Send message
Joined: 4 Apr 19
Posts: 31
Credit: 3,549,068
RAC: 14,360
Message 43037 - Posted: 12 Jul 2020, 9:40:46 UTC

My (theory-only) Ubuntu VM generated similar errors when I upgrade to 20.04. Then I executed the following command and my VM can continue working

sudo /sbin/create-boinc-cgroup
sudo systemctl daemon-reload
sudo systemctl restart boinc-client

I don't know if this applies to ATLAS as well or if it works after reboot. Since it is a VM, I don't need to shut down it when I turn off my computer.
ID: 43037 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 43042 - Posted: 13 Jul 2020, 10:58:45 UTC - in response to Message 43036.  

Hi Gunde,

The thing is that we can't use singularity that follow with cvmfs in focal.


Can you expand on why this doesn't work? It is just incompatible binaries/libraries on Ubuntu or something else?

Thanks,
David
ID: 43042 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43052 - Posted: 13 Jul 2020, 20:10:08 UTC - in response to Message 43042.  

I have experience a permission issue as you can see post above. I have never run into this type of issue before on ubuntu nor centos or arch. My conclusion for now would be that container in this part singularity (theory with runc not tested) got permission issues that i could get pass without application installed on host. I do found any issue that it would need libs in error log in slots folder and same lines in is posted to stderr to task.
I have not follow track of process fully yet but Singularity claim to be fine in startup process long but failed later on around 10 min 4 sec probably when it try to start.

So it could be binaries do not get in correct group or something else.

When installed package for focal i first tried 18.04 package of cvmfs and failed. I have been waiting for focal package and when it got released i used .deb and it failed so it wiped and got package from repo and it was indeed same file but tested anyway and failed to.
Then i focus to make a build so made a build by clone from git and added zlib and libssl-dev along in build as it was required. At the end i started native again and same issue but later version. So i tested to install singularity version used before both in ubuntu and centos at it is singularity 3.5.2. This had another effect and it worked perfectly. So in my view singularity did not proper permission from singularity inside cvmfs img from my host as posted in log or new installed on host changed in install process to 3.5.2 or singularity in img have something missing or broken.

I did't need to change group mixed the cvmfs or boinc when i installed singularity. That confused me that from /var permission error to an install of container changed it outcome.

What i try to understand is how it post that container works but then failed. The check it does not include permission and what changed when choose to install on host instead. I have installed singularity on other ubuntu host as it was required before and left it running so i would need to try on another host.


Sanity check is needed and try on fresh system of focal tomorrow before do anything more. I not good at debugging and if could find cause it would pure luck. My conclusion is now it would work to clone git and add container to host and profit would also be to run on latest version.
ID: 43052 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43070 - Posted: 16 Jul 2020, 0:26:39 UTC

Update: Did fresh install on new 20.04 system and got same issue as before no permission to /var. I sure that singularity img is part of this as additional install correct it on main system permissions.

For further info i experience mount issue in both ubuntu and centos with 3.7.3.0 and 3.7.3.0-1. These appear to start fine at basic repo config to cvmfs but adding additional repo and proxy it get non-responsive and cvmfs stall at any command.
The application respond fine on help command but with any execution command or restart of service it stall or even lock-up until it give up.

In upcoming day i would proceed with older system but to check. In my view it is combo of main issue of singularity (only to ubuntu) but not in centos. Could be that use same permission as it use same system as img is made for. But in experience it had an effect in testing and versions with mix of system made harder but in general for now i found 2.7.3 up to 2.8.0 unstable on later systems. On top of this cvmfs tend to stall with running in root after restart/probe which make it harder. A force reboot to unmount have temp solution during test.

Ubuntu lack of some libs and centos need squashfs-tools and python. Cvmfs tend to post requirement and is good at post issues but in this time a fresh os got permission issues and stall make it near impossible to debugging.
Need to look more on network and repo and proxy to config or kernel as this could break cvmfs.

Fresh 20.04 OS system only client installed using singularity img. strangely post using 2.7.3.0 instead of 2.7.3.0-1 or default 2.7.3.1 that is out now.
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
03:33:11 (24630): wrapper (7.7.26015): starting
03:33:11 (24630): wrapper: running run_atlas (--nthreads 9)
[2020-07-15 03:33:11] Arguments: --nthreads 9
[2020-07-15 03:33:11] Threads: 9
[2020-07-15 03:33:11] Checking for CVMFS
[2020-07-15 03:33:14] Probing /cvmfs/atlas.cern.ch... OK
[2020-07-15 03:33:16] Probing /cvmfs/atlas-condb.cern.ch... OK
[2020-07-15 03:33:22] Probing /cvmfs/grid.cern.ch... OK
[2020-07-15 03:33:26] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
[2020-07-15 03:33:26] 2.7.3.0 24790 0 23984 66905 3 1 101657 4096001 0 65024 0 0 n/a 17158 6248 http://cernvmfs.gridpp.rl.ac.uk/cvmfs/atlas.cern.ch DIRECT 1
[2020-07-15 03:33:26] CVMFS is ok
[2020-07-15 03:33:26] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
[2020-07-15 03:33:26] Checking for singularity binary...
[2020-07-15 03:33:26] Singularity is not installed, using version from CVMFS
[2020-07-15 03:33:26] Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname
[2020-07-15 03:36:33] INFO:  Convert SIF file to sandbox... ripper3-System INFO:  Cleaning up image...
[2020-07-15 03:36:33] Singularity works
[2020-07-15 03:36:33] Set ATHENA_PROC_NUMBER=9
[2020-07-15 03:36:33] Starting ATLAS job with PandaID=4786077036
[2020-07-15 03:36:33] Running command: /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /var/lib/boinc-client/slots/1 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
[2020-07-15 03:36:37] Job failed
[2020-07-15 03:36:37] INFO:  Convert SIF file to sandbox...
[2020-07-15 03:36:37] INFO:  Cleaning up image...
[2020-07-15 03:36:37] FATAL:  container creation failed: mount ->/var error: can't remount /var: operation not permitted
[2020-07-15 03:36:37] ./runtime_log.err
[2020-07-15 03:36:37] ./runtime_log
03:46:38 (24630): run_atlas exited; CPU time 10.650186
03:46:38 (24630): app exit status: 0x1
03:46:38 (24630): called boinc_finish(195)

</stderr_txt>
Centos 8.2 2004 system not able to mount
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
22:29:13 (13069): wrapper (7.7.26015): starting
22:29:13 (13069): wrapper: running run_atlas (--nthreads 12)
[2020-07-15 22:29:13] Arguments: --nthreads 12
[2020-07-15 22:29:13] Threads: 12
[2020-07-15 22:29:13] Checking for CVMFS

</stderr_txt>
]]>


Yesterday before adding python-pip and additional tools

[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/common/exception.py", line 434, in run
[2020-07-15 11:38:50]     self._Thread__target(**self._Thread__kwargs)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 1785, in queue_monitor
[2020-07-15 11:38:50]     update_server(job, args)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 1835, in update_server
[2020-07-15 11:38:50]     send_state(job, args, job.state, xml=dumps(job.fileinfo), metadata=metadata)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 244, in send_state
[2020-07-15 11:38:50]     data = get_data_structure(job, state, args, xml=xml, metadata=metadata)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 543, in get_data_structure
[2020-07-15 11:38:50]     data['cpuConsumptionUnit'] = job.cpuconsumptionunit + "+" + get_cpu_model()
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/util/workernode.py", line 185, in get_cpu_model
[2020-07-15 11:38:50]     with open("/proc/cpuinfo", "r") as f:
[2020-07-15 11:38:50] exception caught by thread run() function: (<type 'exceptions.IOError'>, IOError(2, 'No such file or directory'), <traceback object at 0x7f9ca065ac20>)
[2020-07-15 11:38:50] Traceback (most recent call last):
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/common/exception.py", line 434, in run
[2020-07-15 11:38:50]     self._Thread__target(**self._Thread__kwargs)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 1785, in queue_monitor
[2020-07-15 11:38:50]     update_server(job, args)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 1835, in update_server
[2020-07-15 11:38:50]     send_state(job, args, job.state, xml=dumps(job.fileinfo), metadata=metadata)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 244, in send_state
[2020-07-15 11:38:50]     data = get_data_structure(job, state, args, xml=xml, metadata=metadata)
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/control/job.py", line 543, in get_data_structure
[2020-07-15 11:38:50]     data['cpuConsumptionUnit'] = job.cpuconsumptionunit + "+" + get_cpu_model()
[2020-07-15 11:38:50]   File "/home/ripper3/boinc/slots/1/pilot2/pilot/util/workernode.py", line 185, in get_cpu_model
[2020-07-15 11:38:50]     with open("/proc/cpuinfo", "r") as f:
[2020-07-15 11:38:50] IOError: [Errno 2] No such file or directory: '/proc/cpuinfo'


Today i drop any further test as system not able to mount and issues increase up adding to config and packages do not help from epel. Might need to move into intel to debugging as system also put varning that AMD Ryzen not tested in Centos 8. Could be kernel may not work properly.

Conclusion i can tell you if it is dependencies or lib issue here. During test i face more issue digging into it and probably a mix of issue that require install of sourcecode and additional tools to debugging it. Probably need to load stable kernel and check each packages going into that is required. I not much help but i do few other test before give up.
ID: 43070 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43079 - Posted: 16 Jul 2020, 21:31:34 UTC - in response to Message 43070.  

Centos 8.2 2004 system not able to mount

I can confirm that. I have one machine with CentOS 8.2-2004, too. Theory native tasks do run successful (without singularity being locally installed on that machine) though.

See here (successful Theory task):
https://lhcathome.cern.ch/lhcathome/result.php?resultid=279650345

and here (failed ATLAS task):
https://lhcathome.cern.ch/lhcathome/result.php?resultid=279646460

This is kind of weird!?!
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43079 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43080 - Posted: 16 Jul 2020, 22:00:38 UTC - in response to Message 43079.  

Theory do not need singularity it would use runc instead are added in same way. Maybe Atlas could be coded to use runc instead of singularity?

It same error for both of us still.... this issue to singulari do get correct permission needs a check.

I have followed the guide at regarding mount issue on centos 8.2 2004 from https://cernvm.cern.ch/portal/filesystem/debugmount and no success. Have shutdown SELinux and same there.
I run in virtualbox for now and make few test cernvm but for now i have new clue what i can do. My options is to older distro or get on 20.04 and build for latest versions.
ID: 43080 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43081 - Posted: 16 Jul 2020, 22:07:02 UTC - in response to Message 43080.  
Last modified: 16 Jul 2020, 22:11:17 UTC

Theory do not need singularity it would use runc instead.

You are right about that. I forgot that Theory does not need singularity.

My SELinux is set to "permissive" on that machine. CVMFS is version 2.7.3.0.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43081 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43334 - Posted: 11 Sep 2020, 19:51:27 UTC

Just out of curiosity: Has the problem regarding singularity not able to mount /var on the latest Ubuntu and CentOS versions been sorted out or does it still persist?
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43334 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED


©2024 CERN