Message boards : ATLAS application : ATLAS native_mt fail
Message board moderation

To post messages, you must log in.

AuthorMessage
PoppaGeek

Send message
Joined: 6 Dec 14
Posts: 3
Credit: 130,968
RAC: 0
Message 35468 - Posted: 9 Jun 2018, 19:36:18 UTC

I cannot for the life of me find where to Show Computers I do not know if you can see them. :-/

Why are the tasks failing on this setup?

Thanks!

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
14:18:54 (20335): wrapper (7.7.26015): starting
14:18:54 (20335): wrapper: running run_atlas (--nthreads 2)
singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img
sys.argv = ['run_atlas', '--nthreads', '2']
THREADS=2
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: No such file or directory

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed
copy /var/lib/boinc-client/slots/2/shared/start_atlas.sh
copy /var/lib/boinc-client/slots/2/shared/RTE.tar.gz
copy /var/lib/boinc-client/slots/2/shared/input.tar.gz
copy /var/lib/boinc-client/slots/2/shared/ATLAS.root_0
export ATHENA_PROC_NUMBER=2;start atlas job with PandaID=3957346728
Testing the function of Singularity...
check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname
Singularity Works...
cmd = singularity exec --pwd /var/lib/boinc-client/slots/2 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img sh start_atlas.sh > runtime_log 2> runtime_log.err
running cmd return value is 0

***********************log_extracts.txt*************************
- Last 10 lines from /var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937/PandaJob/athena_stdout.txt -
PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,673 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,675 INFO Now writing wrapper for substep executor EVNTtoHITS
PyJobTransforms.trfExe._writeAthenaWrapper 2018-06-09 14:19:36,676 INFO Valgrind not engaged
PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,676 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
PyJobTransforms.trfExe.execute 2018-06-09 14:19:36,676 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
PyJobTransforms.trfExe.execute 2018-06-09 14:23:34,791 INFO EVNTtoHITS executor returns 64
PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,700 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (64) (Error code 65)
PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,732 INFO Scanning logfile log.EVNTtoHITS for errors
PyJobTransforms.transform.execute 2018-06-09 14:23:36,121 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (64)
PyJobTransforms.transform.execute 2018-06-09 14:23:39,295 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (64))

- Walltime -
JobRetrival=3, StageIn=10, Execution=273, StageOut=0, CleanUp=14

***********************pilot_error_report.json*********************
{
    "3957346728": {
        "2": [
            {
                "pilotErrorCode": 0,
                "pilotErrorDiag": "Job failed: Non-zero failed job return code: 65"
            }
        ]
    }
}
*****************The last 100 lines of the pilot log******************
    "seopt": "token:ATLASDATADISK:srm://srm.ndgf.org:8443/srm/managerv2?SFN=", 
    "sepath": "/atlas/disk/atlasdatadisk/rucio", 
    "seprodpath": "/atlas/disk/atlasdatadisk/rucio", 
    "setokens": "ATLASDATADISK", 
    "site": "BOINC", 
    "siteid": "BOINC_MCORE", 
    "sitershare": null, 
    "space": 0, 
    "special_par": null, 
    "stageinretry": 2, 
    "stageoutretry": 2, 
    "status": "brokeroff", 
    "statusoverride": "offline", 
    "sysconfig": "manual", 
    "system": "arc", 
    "tags": "arc", 
    "tier": "T3", 
    "timefloor": 0, 
    "tmpdir": null, 
    "transferringlimit": 20000, 
    "tspace": "2070-01-01T00:00:00", 
    "use_newmover": "True", 
    "validatedreleases": "True", 
    "version": null, 
    "wansinklimit": null, 
    "wansourcelimit": null, 
    "wnconnectivity": "full", 
    "wntmpdir": null
}

2018-06-09 19:18:57|20784|SiteInformat| Queuedata was successfully downloaded by pilot wrapper script
2018-06-09 19:18:57|20784|ATLASSiteInf| curl command returned valid queuedata
2018-06-09 19:18:57|20784|ATLASSiteInf| Site BOINC_MCORE is currently in brokeroff mode
2018-06-09 19:18:57|20784|ATLASSiteInf| Job recovery turned off
2018-06-09 19:18:57|20784|ATLASSiteInf| Confirmed correctly formatted rucio sepath
2018-06-09 19:18:57|20784|ATLASSiteInf| Confirmed correctly formatted rucio seprodpath
2018-06-09 19:18:57|20784|SiteInformat| Evaluating queuedata
2018-06-09 19:18:57|20784|SiteInformat| Setting unset pilot variables using queuedata
2018-06-09 19:18:57|20784|SiteInformat| appdir: 
2018-06-09 19:18:57|20784|pUtil.py    | File registration will be done by server
2018-06-09 19:18:57|20784|pUtil.py    | Updated stage-in retry number to 2
2018-06-09 19:18:57|20784|pUtil.py    | Updated stage-out retry number to 2
2018-06-09 19:18:57|20784|pUtil.py    | Detected unset (NULL) release/homepackage string
2018-06-09 19:18:57|20784|ATLASExperim| Application dir confirmed: /var/lib/boinc-client/slots/2/
2018-06-09 19:18:57|20784|pilot.py    | Pilot will serve experiment: Nordugrid-ATLAS
2018-06-09 19:18:57|20784|ATLASExperim| Architecture information:
2018-06-09 19:18:57|20784|ATLASExperim| Excuting command: lsb_release -a
2018-06-09 19:18:57|20784|ATLASExperim| 
sh: lsb_release: command not found
2018-06-09 19:18:57|20784|pUtil.py    | getSiteInformation: got experiment=ATLAS
2018-06-09 19:18:57|20784|ATLASExperim| appdirs = ['/cvmfs/atlas.cern.ch/repo/sw']
2018-06-09 19:18:57|20784|ATLASExperim| head of /cvmfs/atlas.cern.ch/repo/sw/ChangeLog: 
--------------------------------------------------------------------------------
2018-06-09 21:00:23 Alessandro De Salvo
	* + AGISData 20180609210023

2018-06-09 20:01:16 Alessandro De Salvo
  * + GroupData 201806092001

2018-06-09 20:00:27 Alessandro De Salvo
	* + AGISData 20180609200027

2018-06-09 19:00:17 Alessandro De Salvo
--------------------------------------------------------------------------------
2018-06-09 19:18:57|20784|ATLASExperim| ATLAS_PYTHON_PILOT set to /usr/bin/python
2018-06-09 19:18:57|20784|pUtil.py    | getSiteInformation: got experiment=ATLAS
2018-06-09 19:18:57|20784|ATLASExperim| Executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;$ATLAS_LOCAL_ROOT_BASE/utilities/checkValidity.sh (time-out: 300)
2018-06-09 19:18:57|20784|pUtil.py    | Executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;$ATLAS_LOCAL_ROOT_BASE/utilities/checkValidity.sh (protected by timed_command, timeout: 300 s)
2018-06-09 19:18:58|20784|pUtil.py    | Elapsed time: 0
2018-06-09 19:18:58|20784|ATLASExperim| Diagnostics tool has verified CVMFS
2018-06-09 19:18:58|20784|Node.py     | Collecting machine features
2018-06-09 19:18:58|20784|Node.py     | $MACHINEFEATURES not defined locally
2018-06-09 19:18:58|20784|Node.py     | $JOBFEATURES not defined locally
2018-06-09 19:18:58|20784|Node.py     | Executing command: hostname -i
2018-06-09 19:18:58|20784|Node.py     | IP number of worker node: 127.0.1.1
2018-06-09 19:18:58|20784|pUtil.py    | getSiteInformation: got experiment=Nordugrid-ATLAS
2018-06-09 19:18:58|20784|pilot.py    | Using site information for experiment: Nordugrid-ATLAS
2018-06-09 19:18:58|20784|pilot.py    | Will attempt to create workdir: /var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937
2018-06-09 19:18:58|20784|pilot.py    | Creating file: /var/lib/boinc-client/slots/2/CURRENT_SITEWORKDIR
2018-06-09 19:18:58|20784|pUtil.py    | Wrote string "/var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937" to file: /var/lib/boinc-client/slots/2/CURRENT_SITEWORKDIR
2018-06-09 19:18:58|20784|ATLASExperim| ATLAS_POOLCOND_PATH not set by wrapper
2018-06-09 19:18:58|20784|pilot.py    | Preparing to execute Cleaner
2018-06-09 19:18:58|20784|pilot.py    | Cleaning /var/lib/boinc-client/slots/2
2018-06-09 19:18:58|20784|Cleaner.py  | Cleaner initialized with clean-up limit: 2 hours
2018-06-09 19:18:58|20784|Cleaner.py  | Cleaner will scan for lost directories in verified path: /var/lib/boinc-client/slots/2
2018-06-09 19:18:58|20784|Cleaner.py  | Executing empty dirs clean-up, stage 1/5
2018-06-09 19:18:58|20784|Cleaner.py  | Purged 0 empty directories
2018-06-09 19:18:58|20784|Cleaner.py  | Executing work dir clean-up, stage 2/5
2018-06-09 19:18:58|20784|Cleaner.py  | Purged 0 single workDirs directories
2018-06-09 19:18:58|20784|Cleaner.py  | Executing maxed-out dirs clean-up, stage 3/5
2018-06-09 19:18:58|20784|Cleaner.py  | Purged 0 empty directories
2018-06-09 19:18:58|20784|Cleaner.py  | Executing AthenaMP clean-up, stage 4/5 <SKIPPED>
2018-06-09 19:18:58|20784|Cleaner.py  | Executing PanDA Pilot dir clean-up, stage 5/5
2018-06-09 19:18:58|20784|Cleaner.py  | Number of found job state files: 0
2018-06-09 19:18:58|20784|Cleaner.py  | No job state files were found, aborting clean-up
2018-06-09 19:18:58|20784|pilot.py    | Update frequencies:
2018-06-09 19:18:58|20784|pilot.py    | ...Processes: 300 s
2018-06-09 19:18:58|20784|pilot.py    | .......Space: 600 s
2018-06-09 19:18:58|20784|pilot.py    | ......Server: 1800 s
2018-06-09 19:18:58|20784|pUtil.py    | Timefloor set to zero in queuedata (multi-jobs disabled)
***************diag file************
runtimeenvironments=APPS/HEP/ATLAS-SITE;
Processors=1
WallTime=411.32s
KernelTime=18.39s
UserTime=252.80s
CPUUsage=65%
MaxResidentMemory=1807372kB
AverageResidentMemory=0kB
AverageTotalMemory=0kB
AverageUnsharedMemory=0kB
AverageUnsharedStack=0kB
AverageSharedMemory=0kB
PageSize=4096B
MajorPageFaults=6937
MinorPageFaults=2270894
Swaps=0
ForcedSwitches=24219
WaitSwitches=487507
Inputs=2706816
Outputs=65056
SocketReceived=0
SocketSent=0
Signals=0

nodename=PoppaGeek@Dev9400
exitcode=0
******************************WorkDir***********************
total 263632
drwxrwx--x 6 boinc boinc      4096 Jun  9 14:25 .
drwxrwx--x 5 boinc boinc      4096 Jun  9 13:30 ..
-rw------- 1 boinc boinc   6739364 Jun  9 14:19 agis_ddmendpoints.cvmfs.json
-rw------- 1 boinc boinc   5359206 Jun  9 14:19 agis_schedconf.cvmfs.json
drwx------ 2 boinc boinc      4096 Jun  9 14:19 .alrb
drwxr-xr-x 3 boinc boinc      4096 Jun  9 14:18 APPS
-rwx------ 1 boinc boinc      2435 Jun  9 10:31 ARCpilot
-rw------- 1 boinc boinc       549 Jun  9 14:19 .asetup
-rw------- 1 boinc boinc     10994 Jun  9 14:19 .asetup.save
-rw-r--r-- 1 boinc boinc         0 Jun  9 14:18 boinc_lockfile
-rw-r--r-- 1 boinc boinc      8192 Jun  9 14:25 boinc_mmap_file
-rw-r--r-- 1 boinc boinc       526 Jun  9 14:23 boinc_task_state.xml
-rw------- 1 boinc boinc        58 Jun  9 14:18 CURRENT_SITEWORKDIR
-rw-r--r-- 1 boinc boinc 256192482 Jun  9 14:18 EVNT.13837267._001172.pool.root.1
-rw-r--r-- 1 boinc boinc      5744 Jun  9 14:18 init_data.xml
-rw-r--r-- 1 boinc boinc   1091389 Jun  9 14:18 input.tar.gz
-rw------- 1 boinc boinc       488 Jun  9 14:25 IUWLDmW1ulsnlyackoJh5iwnABFKDmABFKDmqz7XDmABFKDmOvp3Fm.diag
-rw------- 1 boinc boinc      3467 Jun  9 14:25 jobSmallFiles.tgz
-rw-r--r-- 1 boinc boinc       105 Jun  9 14:18 job.xml
-rw------- 1 boinc boinc    170277 Jun  9 14:25 log.14322886._074314.job.log.1
-rw------- 1 boinc boinc    152071 Jun  9 14:24 log.14322886._074314.job.log.tgz.1
-rw------- 1 boinc boinc      1490 Jun  9 14:24 log_extracts.txt
-rw------- 1 boinc boinc       306 Jun  9 14:23 memory_monitor_summary.json
-rw------- 1 boinc boinc       599 Jun  9 14:25 metadata-surl.xml
-rw------- 1 boinc boinc       241 Jun  9 14:24 output.list
-rw------- 1 boinc boinc        11 Jun  9 14:19 pandaIDs.out
-rw------- 1 boinc boinc      2951 Jun  9 14:19 pandaJobData_1.out
-rw------- 1 boinc boinc      2951 Jun  9 14:18 pandaJobData.out
-rw------- 1 boinc boinc      8158 Jun  9 14:24 panda_node_struct.pickle
-rw------- 1 boinc boinc       203 Jun  9 14:24 pilot_error_report.json
-rw------- 1 boinc boinc        29 Jun  9 14:18 PILOT_INITDIR
-rw------- 1 boinc boinc       139 Jun  9 14:25 pilotlog-last.txt
-rw------- 1 boinc boinc     11387 Jun  9 14:18 pilotlog.txt
drwx------ 3 boinc boinc      4096 Jun  9 14:19 .pki
-rw------- 1 boinc boinc      3751 Jun  9 14:19 queuedata.json
-rw-r--r-- 1 boinc boinc      4376 Jun  9 10:32 queuedata.pilot.json
-rw-r--r-- 1 boinc boinc       606 Jun  9 14:18 RTE.tar.gz
-rwxr-xr-x 1 boinc boinc      8356 Jun  9 14:18 run_atlas
-rw-r--r-- 1 boinc boinc       604 Jun  9 14:25 runtime_log
-rw-r--r-- 1 boinc boinc     10385 Jun  9 14:25 runtime_log.err
drwxrwx--x 2 boinc boinc      4096 Jun  9 14:25 shared
-rw-r--r-- 1 boinc boinc     14425 Jun  9 14:18 start_atlas.sh
-rw------- 1 boinc boinc        19 Jun  9 14:19 START_TIME_3957346728
-rw------- 1 boinc boinc         1 Jun  9 14:18 STATUSCODE
-rw-r--r-- 1 boinc boinc      9737 Jun  9 14:25 stderr.txt
-rw------- 1 boinc boinc        47 Jun  9 14:24 workdir_size-3957346728.json
-rw-r--r-- 1 boinc boinc       100 Jun  9 14:18 wrapper_26015_x86_64-pc-linux-gnu
-rw-r--r-- 1 boinc boinc        24 Jun  9 14:25 wrapper_checkpoint.txt
running start_atlas return value is 0
Parent exit 0
child process exit 0
14:25:47 (20335): run_atlas exited; CPU time 253.180000
14:25:47 (20335): called boinc_finish(0)

</stderr_txt>
]]>
ID: 35468 · Report as offensive     Reply Quote
PoppaGeek

Send message
Joined: 6 Dec 14
Posts: 3
Credit: 130,968
RAC: 0
Message 35469 - Posted: 9 Jun 2018, 20:26:32 UTC - in response to Message 35468.  

Found this:

This is the error that we have seen before: "No events to process: 4050 (skipEvents) >= 2000 (inputEvents of EVNT)"

It happens when the WU tries to process events which do not exist in the input file and is a bug in our ATLAS systems. I have changed the validation logic to pass these results so that the real error gets propagated upstream and so the WU does not get retried, since it will never succeed.


https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4179&postid=33433

So case closed?
ID: 35469 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35470 - Posted: 9 Jun 2018, 23:17:35 UTC - in response to Message 35468.  

I cannot for the life of me find where to Show Computers I do not know if you can see them. :-/

Yes, I can see them.
In your "Primary (default) preferences", there is a checkbox for it (Should LHC@home show your computers on its web site?).
It is probably enabled by default.

I don't see the problem, so I can't comment on it, but it should not last long unless there is something wrong.
ID: 35470 · Report as offensive     Reply Quote
PoppaGeek

Send message
Joined: 6 Dec 14
Posts: 3
Credit: 130,968
RAC: 0
Message 35471 - Posted: 9 Jun 2018, 23:23:05 UTC - in response to Message 35470.  
Last modified: 9 Jun 2018, 23:23:34 UTC

PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,700 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (64) (Error code 65)




***********************pilot_error_report.json*********************
{
    "3957346728": {
        "2": [
            {
                "pilotErrorCode": 0,
                "pilotErrorDiag": "Job failed: Non-zero failed job return code: 65"
            }
        ]
    }
}
*****************The last 100 lines of the pilot log******************


6 work units all completed and validated with runtime less than 500 seconds.
ID: 35471 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35543 - Posted: 17 Jun 2018, 14:29:40 UTC - in response to Message 35471.  

I see other hosts where native application fails.

Host 1
ID: 10511353
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	9 	9,352.80 	1,304,278 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 14:00:28 UTC


Host 2
ID: 10511351
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	254 	10,101.40 	1,276,884 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 13:05:03 UTC


Host 3
ID: 10511349
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	346 	10,209.53 	1,243,281 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 12:42:55 UTC


Host 4
ID: 10511348
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	539 	9,979.68 	1,237,588 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 11:39:32 UTC
There are the same lines in the log which displays :


Host 5
ID: 10511352
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	601 	10,301.15 	1,275,983 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 11:07:29 UTC


Host 6
ID: 10511347
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 	682 	10,034.29 	1,281,656 	7.8.4 	GenuineIntel
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
(32 processors) 	--- 	Linux CentOS
CentOS Linux release 7.4.1708 (Core) [3.10.0-514.26.2.el7.x86_64] 	17 Jun 2018, 10:23:36 UTC


In their logs appears the same lines :

<core_client_version>7.8.4</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
04:02:22 (141976): wrapper (7.7.26015): starting
04:02:22 (141976): wrapper: running run_atlas (--nthreads 12)
singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img
sys.argv = ['run_atlas', '--nthreads', '12']
THREADS=12
Checking for CVMFS
CVMFS is installed
OS:CentOS Linux release 7.4.1708 (Core)

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed
copy /var/lib/boinc/slots/0/shared/ATLAS.root_0
copy /var/lib/boinc/slots/0/shared/input.tar.gz
copy /var/lib/boinc/slots/0/shared/RTE.tar.gz
copy /var/lib/boinc/slots/0/shared/start_atlas.sh
export ATHENA_PROC_NUMBER=12;start atlas job with PandaID=3961528633
Testing the function of Singularity...
check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname

Singularity isnt working...

running start_atlas return value is 3
tar czvf shared/result.tar.gz
tar: Cowardly refusing to create an empty archive
Try `tar --help' or `tar --usage' for more information.

*****************The last 100 lines of the pilot log******************
tail: cannot open &#226;&#128;&#152;pilotlog.txt&#226;&#128;&#153; for reading: No such file or directory


Other similar hosts works fine but have this line instead of the red above :

<core_client_version>7.8.4</core_client_version>
<![CDATA[
<stderr_txt>
13:07:18 (24412): wrapper (7.7.26015): starting
13:07:18 (24412): wrapper: running run_atlas (--nthreads 12)
singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img
sys.argv = ['run_atlas', '--nthreads', '12']
THREADS=12
Checking for CVMFS
CVMFS is installed
OS:CentOS Linux release 7.5.1804 (Core)

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed
copy /root/slots/0/shared/ATLAS.root_0
copy /root/slots/0/shared/input.tar.gz
copy /root/slots/0/shared/RTE.tar.gz
copy /root/slots/0/shared/start_atlas.sh
export ATHENA_PROC_NUMBER=12;start atlas job with PandaID=3964135031
Testing the function of Singularity...
check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname
Singularity Works...
cmd = singularity exec --pwd /root/slots/0 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img sh start_atlas.sh > runtime_log 2> runtime_log.err
running cmd return value is 0


Is it the reason for the constant failure ?
ID: 35543 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,907,025
RAC: 137,966
Message 35546 - Posted: 17 Jun 2018, 19:21:25 UTC

@PHILIPPE

Regarding the hosts mentioned in your post:
Independent from the error I highly recommend to reduce the #threads (currently 12) as this would be extremely inefficient.
#threads may not exceed 4.



@PHILIPPE
@PoppaGeek

The error will most likely disappear if the owner (Agile Boincers or PoppaGeek) runs "sudo cvmfs_config wipecache" and "cvmfs_config probe" immediately before the next WU starts.
ID: 35546 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 7
Credit: 1,062,894
RAC: 589
Message 35625 - Posted: 22 Jun 2018, 20:18:20 UTC

Got some problems with properties, I think:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=199031637
Tasks crash after ten minutes.
Can I do anything about that?
- - - - - - - - - -
Greetings, Jens
ID: 35625 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35633 - Posted: 24 Jun 2018, 7:28:05 UTC - in response to Message 35625.  

Got some problems with properties, I think:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=199031637
Tasks crash after ten minutes.
Can I do anything about that?
looks like your cvmfs installation is not correct:
Checking for CVMFS
ls: Zugriff auf '/cvmfs/atlas.cern.ch/repo/sw' nicht m&#195;&#182;glich: Datei oder Verzeichnis nicht gefunden
cvmfs_config doesn't exist, check cvmfs with cmd ls /cvmfs/atlas.cern.ch/repo/sw
ls /cvmfs/atlas.cern.ch/repo/sw failed,aborting the jobs

Did you install cvmfs, and if yes, how?

Btw, you also need to install singularity (if ot already done) for running native ATLAS on most Linux OS's.
ID: 35633 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 7
Credit: 1,062,894
RAC: 589
Message 35635 - Posted: 24 Jun 2018, 14:40:26 UTC - in response to Message 35633.  

Hi.
Thanks for your answer.
I have neither cvmfs nor singularity (or is it singular?) installed.
Just put some plain Debian onto my machine.
- - - - - - - - - -
Greetings, Jens
ID: 35635 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35636 - Posted: 24 Jun 2018, 16:57:07 UTC - in response to Message 35635.  
Last modified: 24 Jun 2018, 17:28:16 UTC

I have neither cvmfs nor singularity (or is it singular?) installed.
Just put some plain Debian onto my machine.

Here is my latest version for Ubuntu 16.04. I suppose it is the same, or similar, for Debian.

CVMFS:
In order to add the apt repository, run
wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb
rm -f cvmfs-release-latest_all.deb
sudo apt update

=================================================
Installation:

Step 1  Install the CernVM-FS packages:
sudo apt install cvmfs cvmfs-config-default

NOTE: If error about unmet dependencies related to "curl", run:
sudo apt remove libcurl3
and then run Step 1 again.
-------------------------------------------------
Step 2 Base setup:
sudo cvmfs_config setup
-------------------------------------------------
Step 3
Create /etc/cvmfs/default.local and open the file for editing.
(e.g., using sudo gedit)
sudo gedit /etc/cvmfs/default.local
-------------------------------------------------
Step 4
Place the desired repositories in "default.local". 
For ATLAS, for instance, set
CVMFS_REPOSITORIES=atlas.cern.ch,atlas-condb.cern.ch,grid.cern.ch
CVMFS_HTTP_PROXY=DIRECT
-------------------------------------------------
Step 5
Check if CernVM-FS mounts the specified repositories by: 
sudo cvmfs_config probe 
sudo cvmfs_config chksetup

If the probe fails, try to restart autofs with sudo service autofs restart.

Singularity
First, check the latest version on GitHub: https://github.com/singularityware/singularity/releases
And substitute that for "$VERSION"

$VERSION=2.5.1 (as of 6 May 2018)

wget https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz
=> wget https://github.com/singularityware/singularity/releases/download/2.5.1/singularity-2.5.1.tar.gz

tar xvf singularity-$VERSION.tar.gz
=> tar xvf singularity-2.5.1.tar.gz

cd singularity-$VERSION
=> cd singularity-2.5.1

Install package libarchive
sudo apt install libarchive-dev

./configure --prefix=/usr/local
make
sudo make install

To check the version installed:  singularity --version
To check the usage:  singularity --help

It should work, though you may have to enable "Run test applications?" in your settings to get the native ATLAS.

EDIT: Also, I should point out that if you have VirtualBox installed, you will get both the VBox and native versions of ATLAS, depending on the whims of the LHC server. If you want only native ATLAS, then you must remove or deactivate VBox.
ID: 35636 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 7
Credit: 1,062,894
RAC: 589
Message 35640 - Posted: 24 Jun 2018, 21:41:36 UTC - in response to Message 35636.  

Thanks a lot for this input.
The most important thing to me might be this:
EDIT: Also, I should point out that if you have VirtualBox installed, you will get both the VBox and native versions of ATLAS, depending on the whims of the LHC server. If you want only native ATLAS, then you must remove or deactivate VBox.

I've been getting mainly or only native ATLAS although I'm running Virtual Box. I thought it was using vbox, so I wondered what was wrong. On my Macs I just have to add the project and vbox, and everything is fine.
If I should get into problems installing those two (Singularity obviously not being the game I found in the Debian packages), I can still configure Boinc to not run native ATLAS, so it will use the vbox version.
Will give all this a try later-on. So tired I think I might mess something up.
Thanks again.
- - - - - - - - - -
Greetings, Jens
ID: 35640 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS native_mt fail


©2024 CERN