Message boards :
ATLAS application :
ATLAS native_mt error while computing after 600 seconds
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 10 Posts: 4 Credit: 5,356,122 RAC: 0 |
Recently I have been getting errors on the ATLAS native_mt job always precisely after 600 seconds. https://imgur.com/a/ilyLMW8 Does anyone know how to resolve this? or disable native_mt as vbox_64_mt seems to work flawlessly. All the best, Corne |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
In order to run ATLAS native you must have installed two additional programs: -cvmfs (https://cernvm.cern.ch/portal/filesystem) -singularity ( https://www.sylabs.io/singularity/) If you want to build these two programs from source code, you can go here (this has been written for debian): https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840 If you dont want to get native tasks, you should uncheck the box "Run test applications?" in the LHC preference page (https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project) |
Send message Joined: 16 Aug 10 Posts: 4 Credit: 5,356,122 RAC: 0 |
If that's the case than I really feel like it should be an option to exclude native_mt but still participate in regular vbox64_mt_mcore_atlas as it is very wasteful of resources to just push tasks that will fail after 600 seconds. Unchecking test applications is a possible solution but will also opt out of other test applications which I would like to continue to run. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
ATLAS VBox is just ATLAS native running in a VBox. If they fail as native tasks then they'll fail inside the VBox as well. |
Send message Joined: 16 Aug 10 Posts: 4 Credit: 5,356,122 RAC: 0 |
ATLAS VBox is just ATLAS native running in a VBox. If they fail as native tasks then they'll fail inside the VBox as well. Sorry this is not going to hold and is simply completely not true. The operating system being virtualized in the virtualbox environment can be substantially different to an extend were it has significant impact on the ability to run computing tasks. For example the kernel can be different, the bios that is virtualized and it's corresponding smbios are almost certainly different. Inside an virtualized environment it is possible to run completely different operating systems such as Linux on Windows or Windows on Linux. What is important in this case is that running the computing tasks in a vm allows to have cvmfs preinstalled which is a dependency required for the tasks to correctly function. One reason one might prefer to run a computing tasks natively instead of inside a virtual machine is because virtualization imposes a performance hit. |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
I have a similar issue - this is the content of the err file: root@pcbe16072:/var/lib/boinc-client/slots# cat 3/stderr.txt 16:35:08 (661691): wrapper (7.7.26015): starting 16:35:08 (661691): wrapper: running run_atlas (--nthreads 4) singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img sys.argv = ['run_atlas', '--nthreads', '4'] THREADS=4 Checking for CVMFS CVMFS is installed OS:cat: /etc/redhat-release: No such file or directory This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is installed copy /var/lib/boinc-client/slots/3/shared/input.tar.gz copy /var/lib/boinc-client/slots/3/shared/RTE.tar.gz copy /var/lib/boinc-client/slots/3/shared/ATLAS.root_0 copy /var/lib/boinc-client/slots/3/shared/start_atlas.sh export ATHENA_PROC_NUMBER=4;start atlas job with Testing the function of Singularity... check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname Singularity isnt working... running start_atlas return value is 3 tar cvf shared/result.tar.gz tar: Cowardly refusing to create an empty archive Try 'tar --help' or 'tar --usage' for more information. *****************The last 100 lines of the pilot log****************** tail: cannot open 'pilotlog.txt' for reading: No such file or directory ***************diag file************ cat: '*.diag': No such file or directory ******************************WorkDir*********************** total 370952 drwxrwx--x 3 boinc boinc 4096 Feb 20 16:35 . drwxrwx--x 7 boinc boinc 4096 Feb 12 15:47 .. -rw-r--r-- 1 boinc boinc 378669148 Feb 20 16:35 ATLAS.root_0 -rw-r--r-- 1 boinc boinc 0 Feb 20 16:35 boinc_lockfile -rw-r--r-- 1 boinc boinc 8192 Feb 20 16:35 boinc_mmap_file -rw-r--r-- 1 boinc boinc 504 Feb 20 16:35 boinc_task_state.xml -rw-r--r-- 1 boinc boinc 6014 Feb 20 16:35 init_data.xml -rw-r--r-- 1 boinc boinc 1098065 Feb 20 16:35 input.tar.gz -rw-r--r-- 1 boinc boinc 105 Feb 20 16:35 job.xml -rw-r--r-- 1 boinc boinc 606 Feb 20 16:35 RTE.tar.gz -rwxr-xr-x 1 boinc boinc 8501 Feb 20 16:35 run_atlas drwxrwx--x 2 boinc boinc 4096 Feb 20 16:35 shared -rw-r--r-- 1 boinc boinc 8740 Feb 20 16:35 start_atlas.sh -rw-r--r-- 1 boinc boinc 1298 Feb 20 16:35 stderr.txt -rw-r--r-- 1 boinc boinc 100 Feb 20 16:35 wrapper_26015_x86_64-pc-linux-gnu -rw-r--r-- 1 boinc boinc 20 Feb 20 16:35 wrapper_checkpoint.txtroot@pcbe16072:/var/lib/boinc-client/slots# root@pcbe16072:/var/lib/boinc-client/slots# ls -ltrh 3/stderr.txt -rw-r--r-- 1 boinc boinc 2.4K Feb 20 16:35 3/stderr.txt root@pcbe16072:/var/lib/boinc-client/slots# tail -f 3/stderr.txt -rw-r--r-- 1 boinc boinc 6014 Feb 20 16:35 init_data.xml -rw-r--r-- 1 boinc boinc 1098065 Feb 20 16:35 input.tar.gz -rw-r--r-- 1 boinc boinc 105 Feb 20 16:35 job.xml -rw-r--r-- 1 boinc boinc 606 Feb 20 16:35 RTE.tar.gz -rwxr-xr-x 1 boinc boinc 8501 Feb 20 16:35 run_atlas drwxrwx--x 2 boinc boinc 4096 Feb 20 16:35 shared -rw-r--r-- 1 boinc boinc 8740 Feb 20 16:35 start_atlas.sh -rw-r--r-- 1 boinc boinc 1298 Feb 20 16:35 stderr.txt -rw-r--r-- 1 boinc boinc 100 Feb 20 16:35 wrapper_26015_x86_64-pc-linux-gnu -rw-r--r-- 1 boinc boinc 20 Feb 20 16:35 wrapper_checkpoint.txtparent process exit 3 child process exit 3 16:45:13 (661691): run_atlas exited; CPU time 0.072000 16:45:13 (661691): app exit status: 0x3 16:45:13 (661691): called boinc_finish(195) if I run the singularity command on the shell: pcbe16072 20-02-2019 16:45:46 ~ # singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname ERROR : Could not create /dev/loop8: Permission denied ABORT : Retval = 255 This is of course without being super user, as the BOINC user should be... (/dev/ belongs to root). Is everything fine with the job sent out? Thanks for the help, |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,366,281 RAC: 123,229 |
This line can often be seen when CVMFS is not working correctly: tar: Cowardly refusing to create an empty archive Could you ensure that it is configured correctly? You may try "cvmfs_config probe". The output should be: Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK If this is OK, next step would be to check Singularity. You may find some hints in gyllic's guide. |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
thanks for the link - the checks are fine: > cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK > singularity --version 2.4.2-dist Anyway, the link reports how to build singularity from source code (at the time, 3.0.???), that I don't want to do for the moment - I will soon update the desktop pc to UBUNTU 18.04, so I think it is not worth the effort to remove and re-install from scratch... On the other hand, it looks to me that something is wrong with permissions, but I don't want to give BOINC sudo rights... |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,366,281 RAC: 123,229 |
Could also be a namespace issue similar to this: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=447&postid=5830 |
©2024 CERN