1)
Message boards :
ATLAS application :
ATLAS native_mt fail
(Message 35471)
Posted 9 Jun 2018 by PoppaGeek Post: PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,700 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (64) (Error code 65) ***********************pilot_error_report.json********************* { "3957346728": { "2": [ { "pilotErrorCode": 0, "pilotErrorDiag": "Job failed: Non-zero failed job return code: 65" } ] } } *****************The last 100 lines of the pilot log****************** 6 work units all completed and validated with runtime less than 500 seconds. |
2)
Message boards :
ATLAS application :
ATLAS native_mt fail
(Message 35469)
Posted 9 Jun 2018 by PoppaGeek Post: Found this: This is the error that we have seen before: "No events to process: 4050 (skipEvents) >= 2000 (inputEvents of EVNT)" https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4179&postid=33433 So case closed? |
3)
Message boards :
ATLAS application :
ATLAS native_mt fail
(Message 35468)
Posted 9 Jun 2018 by PoppaGeek Post: I cannot for the life of me find where to Show Computers I do not know if you can see them. :-/ Why are the tasks failing on this setup? Thanks! <core_client_version>7.6.33</core_client_version> <![CDATA[ <stderr_txt> 14:18:54 (20335): wrapper (7.7.26015): starting 14:18:54 (20335): wrapper: running run_atlas (--nthreads 2) singularity image is /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img sys.argv = ['run_atlas', '--nthreads', '2'] THREADS=2 Checking for CVMFS CVMFS is installed OS:cat: /etc/redhat-release: No such file or directory This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is installed copy /var/lib/boinc-client/slots/2/shared/start_atlas.sh copy /var/lib/boinc-client/slots/2/shared/RTE.tar.gz copy /var/lib/boinc-client/slots/2/shared/input.tar.gz copy /var/lib/boinc-client/slots/2/shared/ATLAS.root_0 export ATHENA_PROC_NUMBER=2;start atlas job with PandaID=3957346728 Testing the function of Singularity... check singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname Singularity Works... cmd = singularity exec --pwd /var/lib/boinc-client/slots/2 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img sh start_atlas.sh > runtime_log 2> runtime_log.err running cmd return value is 0 ***********************log_extracts.txt************************* - Last 10 lines from /var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937/PandaJob/athena_stdout.txt - PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,673 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,675 INFO Now writing wrapper for substep executor EVNTtoHITS PyJobTransforms.trfExe._writeAthenaWrapper 2018-06-09 14:19:36,676 INFO Valgrind not engaged PyJobTransforms.trfExe.preExecute 2018-06-09 14:19:36,676 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh'] PyJobTransforms.trfExe.execute 2018-06-09 14:19:36,676 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) PyJobTransforms.trfExe.execute 2018-06-09 14:23:34,791 INFO EVNTtoHITS executor returns 64 PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,700 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (64) (Error code 65) PyJobTransforms.trfExe.validate 2018-06-09 14:23:35,732 INFO Scanning logfile log.EVNTtoHITS for errors PyJobTransforms.transform.execute 2018-06-09 14:23:36,121 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (64) PyJobTransforms.transform.execute 2018-06-09 14:23:39,295 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (64)) - Walltime - JobRetrival=3, StageIn=10, Execution=273, StageOut=0, CleanUp=14 ***********************pilot_error_report.json********************* { "3957346728": { "2": [ { "pilotErrorCode": 0, "pilotErrorDiag": "Job failed: Non-zero failed job return code: 65" } ] } } *****************The last 100 lines of the pilot log****************** "seopt": "token:ATLASDATADISK:srm://srm.ndgf.org:8443/srm/managerv2?SFN=", "sepath": "/atlas/disk/atlasdatadisk/rucio", "seprodpath": "/atlas/disk/atlasdatadisk/rucio", "setokens": "ATLASDATADISK", "site": "BOINC", "siteid": "BOINC_MCORE", "sitershare": null, "space": 0, "special_par": null, "stageinretry": 2, "stageoutretry": 2, "status": "brokeroff", "statusoverride": "offline", "sysconfig": "manual", "system": "arc", "tags": "arc", "tier": "T3", "timefloor": 0, "tmpdir": null, "transferringlimit": 20000, "tspace": "2070-01-01T00:00:00", "use_newmover": "True", "validatedreleases": "True", "version": null, "wansinklimit": null, "wansourcelimit": null, "wnconnectivity": "full", "wntmpdir": null } 2018-06-09 19:18:57|20784|SiteInformat| Queuedata was successfully downloaded by pilot wrapper script 2018-06-09 19:18:57|20784|ATLASSiteInf| curl command returned valid queuedata 2018-06-09 19:18:57|20784|ATLASSiteInf| Site BOINC_MCORE is currently in brokeroff mode 2018-06-09 19:18:57|20784|ATLASSiteInf| Job recovery turned off 2018-06-09 19:18:57|20784|ATLASSiteInf| Confirmed correctly formatted rucio sepath 2018-06-09 19:18:57|20784|ATLASSiteInf| Confirmed correctly formatted rucio seprodpath 2018-06-09 19:18:57|20784|SiteInformat| Evaluating queuedata 2018-06-09 19:18:57|20784|SiteInformat| Setting unset pilot variables using queuedata 2018-06-09 19:18:57|20784|SiteInformat| appdir: 2018-06-09 19:18:57|20784|pUtil.py | File registration will be done by server 2018-06-09 19:18:57|20784|pUtil.py | Updated stage-in retry number to 2 2018-06-09 19:18:57|20784|pUtil.py | Updated stage-out retry number to 2 2018-06-09 19:18:57|20784|pUtil.py | Detected unset (NULL) release/homepackage string 2018-06-09 19:18:57|20784|ATLASExperim| Application dir confirmed: /var/lib/boinc-client/slots/2/ 2018-06-09 19:18:57|20784|pilot.py | Pilot will serve experiment: Nordugrid-ATLAS 2018-06-09 19:18:57|20784|ATLASExperim| Architecture information: 2018-06-09 19:18:57|20784|ATLASExperim| Excuting command: lsb_release -a 2018-06-09 19:18:57|20784|ATLASExperim| sh: lsb_release: command not found 2018-06-09 19:18:57|20784|pUtil.py | getSiteInformation: got experiment=ATLAS 2018-06-09 19:18:57|20784|ATLASExperim| appdirs = ['/cvmfs/atlas.cern.ch/repo/sw'] 2018-06-09 19:18:57|20784|ATLASExperim| head of /cvmfs/atlas.cern.ch/repo/sw/ChangeLog: -------------------------------------------------------------------------------- 2018-06-09 21:00:23 Alessandro De Salvo * + AGISData 20180609210023 2018-06-09 20:01:16 Alessandro De Salvo * + GroupData 201806092001 2018-06-09 20:00:27 Alessandro De Salvo * + AGISData 20180609200027 2018-06-09 19:00:17 Alessandro De Salvo -------------------------------------------------------------------------------- 2018-06-09 19:18:57|20784|ATLASExperim| ATLAS_PYTHON_PILOT set to /usr/bin/python 2018-06-09 19:18:57|20784|pUtil.py | getSiteInformation: got experiment=ATLAS 2018-06-09 19:18:57|20784|ATLASExperim| Executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;$ATLAS_LOCAL_ROOT_BASE/utilities/checkValidity.sh (time-out: 300) 2018-06-09 19:18:57|20784|pUtil.py | Executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;$ATLAS_LOCAL_ROOT_BASE/utilities/checkValidity.sh (protected by timed_command, timeout: 300 s) 2018-06-09 19:18:58|20784|pUtil.py | Elapsed time: 0 2018-06-09 19:18:58|20784|ATLASExperim| Diagnostics tool has verified CVMFS 2018-06-09 19:18:58|20784|Node.py | Collecting machine features 2018-06-09 19:18:58|20784|Node.py | $MACHINEFEATURES not defined locally 2018-06-09 19:18:58|20784|Node.py | $JOBFEATURES not defined locally 2018-06-09 19:18:58|20784|Node.py | Executing command: hostname -i 2018-06-09 19:18:58|20784|Node.py | IP number of worker node: 127.0.1.1 2018-06-09 19:18:58|20784|pUtil.py | getSiteInformation: got experiment=Nordugrid-ATLAS 2018-06-09 19:18:58|20784|pilot.py | Using site information for experiment: Nordugrid-ATLAS 2018-06-09 19:18:58|20784|pilot.py | Will attempt to create workdir: /var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937 2018-06-09 19:18:58|20784|pilot.py | Creating file: /var/lib/boinc-client/slots/2/CURRENT_SITEWORKDIR 2018-06-09 19:18:58|20784|pUtil.py | Wrote string "/var/lib/boinc-client/slots/2/Panda_Pilot_20784_1528571937" to file: /var/lib/boinc-client/slots/2/CURRENT_SITEWORKDIR 2018-06-09 19:18:58|20784|ATLASExperim| ATLAS_POOLCOND_PATH not set by wrapper 2018-06-09 19:18:58|20784|pilot.py | Preparing to execute Cleaner 2018-06-09 19:18:58|20784|pilot.py | Cleaning /var/lib/boinc-client/slots/2 2018-06-09 19:18:58|20784|Cleaner.py | Cleaner initialized with clean-up limit: 2 hours 2018-06-09 19:18:58|20784|Cleaner.py | Cleaner will scan for lost directories in verified path: /var/lib/boinc-client/slots/2 2018-06-09 19:18:58|20784|Cleaner.py | Executing empty dirs clean-up, stage 1/5 2018-06-09 19:18:58|20784|Cleaner.py | Purged 0 empty directories 2018-06-09 19:18:58|20784|Cleaner.py | Executing work dir clean-up, stage 2/5 2018-06-09 19:18:58|20784|Cleaner.py | Purged 0 single workDirs directories 2018-06-09 19:18:58|20784|Cleaner.py | Executing maxed-out dirs clean-up, stage 3/5 2018-06-09 19:18:58|20784|Cleaner.py | Purged 0 empty directories 2018-06-09 19:18:58|20784|Cleaner.py | Executing AthenaMP clean-up, stage 4/5 <SKIPPED> 2018-06-09 19:18:58|20784|Cleaner.py | Executing PanDA Pilot dir clean-up, stage 5/5 2018-06-09 19:18:58|20784|Cleaner.py | Number of found job state files: 0 2018-06-09 19:18:58|20784|Cleaner.py | No job state files were found, aborting clean-up 2018-06-09 19:18:58|20784|pilot.py | Update frequencies: 2018-06-09 19:18:58|20784|pilot.py | ...Processes: 300 s 2018-06-09 19:18:58|20784|pilot.py | .......Space: 600 s 2018-06-09 19:18:58|20784|pilot.py | ......Server: 1800 s 2018-06-09 19:18:58|20784|pUtil.py | Timefloor set to zero in queuedata (multi-jobs disabled) ***************diag file************ runtimeenvironments=APPS/HEP/ATLAS-SITE; Processors=1 WallTime=411.32s KernelTime=18.39s UserTime=252.80s CPUUsage=65% MaxResidentMemory=1807372kB AverageResidentMemory=0kB AverageTotalMemory=0kB AverageUnsharedMemory=0kB AverageUnsharedStack=0kB AverageSharedMemory=0kB PageSize=4096B MajorPageFaults=6937 MinorPageFaults=2270894 Swaps=0 ForcedSwitches=24219 WaitSwitches=487507 Inputs=2706816 Outputs=65056 SocketReceived=0 SocketSent=0 Signals=0 nodename=PoppaGeek@Dev9400 exitcode=0 ******************************WorkDir*********************** total 263632 drwxrwx--x 6 boinc boinc 4096 Jun 9 14:25 . drwxrwx--x 5 boinc boinc 4096 Jun 9 13:30 .. -rw------- 1 boinc boinc 6739364 Jun 9 14:19 agis_ddmendpoints.cvmfs.json -rw------- 1 boinc boinc 5359206 Jun 9 14:19 agis_schedconf.cvmfs.json drwx------ 2 boinc boinc 4096 Jun 9 14:19 .alrb drwxr-xr-x 3 boinc boinc 4096 Jun 9 14:18 APPS -rwx------ 1 boinc boinc 2435 Jun 9 10:31 ARCpilot -rw------- 1 boinc boinc 549 Jun 9 14:19 .asetup -rw------- 1 boinc boinc 10994 Jun 9 14:19 .asetup.save -rw-r--r-- 1 boinc boinc 0 Jun 9 14:18 boinc_lockfile -rw-r--r-- 1 boinc boinc 8192 Jun 9 14:25 boinc_mmap_file -rw-r--r-- 1 boinc boinc 526 Jun 9 14:23 boinc_task_state.xml -rw------- 1 boinc boinc 58 Jun 9 14:18 CURRENT_SITEWORKDIR -rw-r--r-- 1 boinc boinc 256192482 Jun 9 14:18 EVNT.13837267._001172.pool.root.1 -rw-r--r-- 1 boinc boinc 5744 Jun 9 14:18 init_data.xml -rw-r--r-- 1 boinc boinc 1091389 Jun 9 14:18 input.tar.gz -rw------- 1 boinc boinc 488 Jun 9 14:25 IUWLDmW1ulsnlyackoJh5iwnABFKDmABFKDmqz7XDmABFKDmOvp3Fm.diag -rw------- 1 boinc boinc 3467 Jun 9 14:25 jobSmallFiles.tgz -rw-r--r-- 1 boinc boinc 105 Jun 9 14:18 job.xml -rw------- 1 boinc boinc 170277 Jun 9 14:25 log.14322886._074314.job.log.1 -rw------- 1 boinc boinc 152071 Jun 9 14:24 log.14322886._074314.job.log.tgz.1 -rw------- 1 boinc boinc 1490 Jun 9 14:24 log_extracts.txt -rw------- 1 boinc boinc 306 Jun 9 14:23 memory_monitor_summary.json -rw------- 1 boinc boinc 599 Jun 9 14:25 metadata-surl.xml -rw------- 1 boinc boinc 241 Jun 9 14:24 output.list -rw------- 1 boinc boinc 11 Jun 9 14:19 pandaIDs.out -rw------- 1 boinc boinc 2951 Jun 9 14:19 pandaJobData_1.out -rw------- 1 boinc boinc 2951 Jun 9 14:18 pandaJobData.out -rw------- 1 boinc boinc 8158 Jun 9 14:24 panda_node_struct.pickle -rw------- 1 boinc boinc 203 Jun 9 14:24 pilot_error_report.json -rw------- 1 boinc boinc 29 Jun 9 14:18 PILOT_INITDIR -rw------- 1 boinc boinc 139 Jun 9 14:25 pilotlog-last.txt -rw------- 1 boinc boinc 11387 Jun 9 14:18 pilotlog.txt drwx------ 3 boinc boinc 4096 Jun 9 14:19 .pki -rw------- 1 boinc boinc 3751 Jun 9 14:19 queuedata.json -rw-r--r-- 1 boinc boinc 4376 Jun 9 10:32 queuedata.pilot.json -rw-r--r-- 1 boinc boinc 606 Jun 9 14:18 RTE.tar.gz -rwxr-xr-x 1 boinc boinc 8356 Jun 9 14:18 run_atlas -rw-r--r-- 1 boinc boinc 604 Jun 9 14:25 runtime_log -rw-r--r-- 1 boinc boinc 10385 Jun 9 14:25 runtime_log.err drwxrwx--x 2 boinc boinc 4096 Jun 9 14:25 shared -rw-r--r-- 1 boinc boinc 14425 Jun 9 14:18 start_atlas.sh -rw------- 1 boinc boinc 19 Jun 9 14:19 START_TIME_3957346728 -rw------- 1 boinc boinc 1 Jun 9 14:18 STATUSCODE -rw-r--r-- 1 boinc boinc 9737 Jun 9 14:25 stderr.txt -rw------- 1 boinc boinc 47 Jun 9 14:24 workdir_size-3957346728.json -rw-r--r-- 1 boinc boinc 100 Jun 9 14:18 wrapper_26015_x86_64-pc-linux-gnu -rw-r--r-- 1 boinc boinc 24 Jun 9 14:25 wrapper_checkpoint.txt running start_atlas return value is 0 Parent exit 0 child process exit 0 14:25:47 (20335): run_atlas exited; CPU time 253.180000 14:25:47 (20335): called boinc_finish(0) </stderr_txt> ]]> |
©2023 CERN