Message boards : ATLAS application : ATLAS native_mt error while computing after 600 seconds
Message board moderation

To post messages, you must log in.

AuthorMessage
2BbwZPYG2XaMQQaG8qsvsz4QDGuL

Send message
Joined: 16 Aug 10
Posts: 4
Credit: 5,356,122
RAC: 0
Message 37974 - Posted: 11 Feb 2019, 7:39:16 UTC

Recently I have been getting errors on the ATLAS native_mt job always precisely after 600 seconds.

https://imgur.com/a/ilyLMW8

Does anyone know how to resolve this? or disable native_mt as vbox_64_mt seems to work flawlessly.

All the best,
Corne
ID: 37974 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 37975 - Posted: 11 Feb 2019, 8:57:57 UTC - in response to Message 37974.  
Last modified: 11 Feb 2019, 9:03:25 UTC

In order to run ATLAS native you must have installed two additional programs:
-cvmfs (https://cernvm.cern.ch/portal/filesystem)
-singularity ( https://www.sylabs.io/singularity/)

If you want to build these two programs from source code, you can go here (this has been written for debian): https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840

If you dont want to get native tasks, you should uncheck the box "Run test applications?" in the LHC preference page (https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project)
ID: 37975 · Report as offensive     Reply Quote
2BbwZPYG2XaMQQaG8qsvsz4QDGuL

Send message
Joined: 16 Aug 10
Posts: 4
Credit: 5,356,122
RAC: 0
Message 37976 - Posted: 11 Feb 2019, 9:10:56 UTC - in response to Message 37975.  
Last modified: 11 Feb 2019, 9:12:13 UTC

If that's the case than I really feel like it should be an option to exclude native_mt but still participate in regular vbox64_mt_mcore_atlas as it is very wasteful of resources to just push tasks that will fail after 600 seconds.

Unchecking test applications is a possible solution but will also opt out of other test applications which I would like to continue to run.
ID: 37976 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37982 - Posted: 11 Feb 2019, 22:32:04 UTC - in response to Message 37976.  

ATLAS VBox is just ATLAS native running in a VBox. If they fail as native tasks then they'll fail inside the VBox as well.
ID: 37982 · Report as offensive     Reply Quote
2BbwZPYG2XaMQQaG8qsvsz4QDGuL

Send message
Joined: 16 Aug 10
Posts: 4
Credit: 5,356,122
RAC: 0
Message 37998 - Posted: 12 Feb 2019, 21:10:32 UTC - in response to Message 37982.  

ATLAS VBox is just ATLAS native running in a VBox. If they fail as native tasks then they'll fail inside the VBox as well.


Sorry this is not going to hold and is simply completely not true. The operating system being virtualized in the virtualbox environment can be substantially different to an extend were it has significant impact on the ability to run computing tasks.

For example the kernel can be different, the bios that is virtualized and it's corresponding smbios are almost certainly different. Inside an virtualized environment it is possible to run completely different operating systems such as Linux on Windows or Windows on Linux.

What is important in this case is that running the computing tasks in a vm allows to have cvmfs preinstalled which is a dependency required for the tasks to correctly function. One reason one might prefer to run a computing tasks natively instead of inside a virtual machine is because virtualization imposes a performance hit.
ID: 37998 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 38039 - Posted: 20 Feb 2019, 15:54:52 UTC - in response to Message 37974.  

I have a similar issue - this is the content of the err file:
root@pcbe16072:/var/lib/boinc-client/slots# cat 3/stderr.txt
16:35:08 (661691): wrapper (7.7.26015): starting
16:35:08 (661691): wrapper: running run_atlas (--nthreads 4)
singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img
sys.argv = ['run_atlas', '--nthreads', '4']
THREADS=4
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: No such file or directory

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed
copy /var/lib/boinc-client/slots/3/shared/input.tar.gz
copy /var/lib/boinc-client/slots/3/shared/RTE.tar.gz
copy /var/lib/boinc-client/slots/3/shared/ATLAS.root_0
copy /var/lib/boinc-client/slots/3/shared/start_atlas.sh
export ATHENA_PROC_NUMBER=4;start atlas job with 
Testing the function of Singularity...
check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname

Singularity isnt working...

running start_atlas return value is 3
tar cvf shared/result.tar.gz 
tar: Cowardly refusing to create an empty archive
Try 'tar --help' or 'tar --usage' for more information.

*****************The last 100 lines of the pilot log******************
tail: cannot open 'pilotlog.txt' for reading: No such file or directory
***************diag file************
cat: '*.diag': No such file or directory
******************************WorkDir***********************
total 370952
drwxrwx--x 3 boinc boinc      4096 Feb 20 16:35 .
drwxrwx--x 7 boinc boinc      4096 Feb 12 15:47 ..
-rw-r--r-- 1 boinc boinc 378669148 Feb 20 16:35 ATLAS.root_0
-rw-r--r-- 1 boinc boinc         0 Feb 20 16:35 boinc_lockfile
-rw-r--r-- 1 boinc boinc      8192 Feb 20 16:35 boinc_mmap_file
-rw-r--r-- 1 boinc boinc       504 Feb 20 16:35 boinc_task_state.xml
-rw-r--r-- 1 boinc boinc      6014 Feb 20 16:35 init_data.xml
-rw-r--r-- 1 boinc boinc   1098065 Feb 20 16:35 input.tar.gz
-rw-r--r-- 1 boinc boinc       105 Feb 20 16:35 job.xml
-rw-r--r-- 1 boinc boinc       606 Feb 20 16:35 RTE.tar.gz
-rwxr-xr-x 1 boinc boinc      8501 Feb 20 16:35 run_atlas
drwxrwx--x 2 boinc boinc      4096 Feb 20 16:35 shared
-rw-r--r-- 1 boinc boinc      8740 Feb 20 16:35 start_atlas.sh
-rw-r--r-- 1 boinc boinc      1298 Feb 20 16:35 stderr.txt
-rw-r--r-- 1 boinc boinc       100 Feb 20 16:35 wrapper_26015_x86_64-pc-linux-gnu
-rw-r--r-- 1 boinc boinc        20 Feb 20 16:35 wrapper_checkpoint.txtroot@pcbe16072:/var/lib/boinc-client/slots# 
root@pcbe16072:/var/lib/boinc-client/slots# ls -ltrh 3/stderr.txt
-rw-r--r-- 1 boinc boinc 2.4K Feb 20 16:35 3/stderr.txt
root@pcbe16072:/var/lib/boinc-client/slots# tail -f  3/stderr.txt
-rw-r--r-- 1 boinc boinc      6014 Feb 20 16:35 init_data.xml
-rw-r--r-- 1 boinc boinc   1098065 Feb 20 16:35 input.tar.gz
-rw-r--r-- 1 boinc boinc       105 Feb 20 16:35 job.xml
-rw-r--r-- 1 boinc boinc       606 Feb 20 16:35 RTE.tar.gz
-rwxr-xr-x 1 boinc boinc      8501 Feb 20 16:35 run_atlas
drwxrwx--x 2 boinc boinc      4096 Feb 20 16:35 shared
-rw-r--r-- 1 boinc boinc      8740 Feb 20 16:35 start_atlas.sh
-rw-r--r-- 1 boinc boinc      1298 Feb 20 16:35 stderr.txt
-rw-r--r-- 1 boinc boinc       100 Feb 20 16:35 wrapper_26015_x86_64-pc-linux-gnu
-rw-r--r-- 1 boinc boinc        20 Feb 20 16:35 wrapper_checkpoint.txtparent process exit 3
child process exit 3
16:45:13 (661691): run_atlas exited; CPU time 0.072000
16:45:13 (661691): app exit status: 0x3
16:45:13 (661691): called boinc_finish(195)


if I run the singularity command on the shell:
pcbe16072 20-02-2019 16:45:46 ~ # singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname
ERROR  : Could not create /dev/loop8: Permission denied
ABORT  : Retval = 255

This is of course without being super user, as the BOINC user should be... (/dev/ belongs to root).

Is everything fine with the job sent out?
Thanks for the help,
ID: 38039 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,981,799
RAC: 136,280
Message 38040 - Posted: 20 Feb 2019, 16:14:56 UTC - in response to Message 38039.  

This line can often be seen when CVMFS is not working correctly:
tar: Cowardly refusing to create an empty archive

Could you ensure that it is configured correctly?
You may try "cvmfs_config probe".
The output should be:
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK


If this is OK, next step would be to check Singularity.
You may find some hints in gyllic's guide.
ID: 38040 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 38041 - Posted: 20 Feb 2019, 16:41:03 UTC - in response to Message 38040.  

thanks for the link - the checks are fine:
 > cvmfs_config probe
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
 > singularity --version
2.4.2-dist


Anyway, the link reports how to build singularity from source code (at the time, 3.0.???), that I don't want to do for the moment - I will soon update the desktop pc to UBUNTU 18.04, so I think it is not worth the effort to remove and re-install from scratch...

On the other hand, it looks to me that something is wrong with permissions, but I don't want to give BOINC sudo rights...
ID: 38041 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,981,799
RAC: 136,280
Message 38042 - Posted: 20 Feb 2019, 17:01:02 UTC - in response to Message 38041.  

ID: 38042 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS native_mt error while computing after 600 seconds


©2024 CERN