Message boards :
ATLAS application :
new series of ATLAS tasks - runtime 493 secs ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,694,356 RAC: 43,567 ![]() ![]() ![]() |
I just had a 2-core ATLAS task the total runtime of which was 493,73 seconds, CPU time 173,11 seconds, yielding 3,95 credit points. Never ever before I saw such a short ATLAS task - is this a new series now? |
![]() Send message Joined: 15 Jun 08 Posts: 2139 Credit: 174,693,366 RAC: 98,814 ![]() ![]() ![]() |
Your logfile shows a couple of errors that have to be investigated by the project team. David, are you aware of it? https://lhcathome.cern.ch/lhcathome/result.php?resultid=186673577 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,895 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,897 INFO Now writing wrapper for substep executor EVNTtoHITS 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-04-05 16:20:55,897 INFO Valgrind not engaged 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,898 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh'] 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.execute 2018-04-05 16:20:55,898 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.execute 2018-04-05 16:22:50,760 INFO EVNTtoHITS executor returns 139 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.validate 2018-04-05 16:22:51,775 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65) 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.validate 2018-04-05 16:22:51,826 INFO Scanning logfile log.EVNTtoHITS for errors 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.transform.execute 2018-04-05 16:22:51,883 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139) 2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.transform.execute 2018-04-05 16:22:55,002 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139)) |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,694,356 RAC: 43,567 ![]() ![]() ![]() |
Your logfile shows a couple of errors that have to be investigated by the project team.thanks for your efforts - I didn't even bother to look up the stderr, since the task got finished successfully. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
thanks for your efforts - I didn't even bother to look up the stderr, since the task got finished successfully.successful for boinc standards, not successfull for atlas standards, since there is no HITS file |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,694,356 RAC: 43,567 ![]() ![]() ![]() |
successful for boinc standards, not successfull for atlas standards, since there is no HITS fileyes, unfortunately :-( |
Send message Joined: 13 May 14 Posts: 379 Credit: 15,227,526 RAC: 6,387 ![]() ![]() ![]() |
I noticed a few of failures like this in the new batch of tasks - "EVNTtoHITS got a SIGSEGV signal" means a crash of the ATLAS code so nothing you can do about it I'm afraid. We still give credits for these kind of failures so we don't punish volunteers for problems in the ATLAS code, but as gyllic said, no HITS file means it was unsuccessful for ATLAS. |
Send message Joined: 19 Feb 08 Posts: 707 Credit: 4,335,771 RAC: 30 ![]() ![]() |
Ths same thing happens to me on my Windows 10 PC, Atlas tasks complete and validate in a very short time but no HITS file.. Tullio |
![]() Send message Joined: 15 Jun 08 Posts: 2139 Credit: 174,693,366 RAC: 98,814 ![]() ![]() ![]() |
The recent ATLAS batch works on a large input file (>300MB). This may be a bit too large for your local configuration. You may try to increase the RAM setting for your VMs by a few 100 MB to 4600-4800 MB. The relevant setting is located in your app_config.xml. Hope this will be successful. |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,694,356 RAC: 43,567 ![]() ![]() ![]() |
again I had an ATLAS task with runtime of about 10 minutes: https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334 |
![]() Send message Joined: 15 Jun 08 Posts: 2139 Credit: 174,693,366 RAC: 98,814 ![]() ![]() ![]() |
again I had an ATLAS task with runtime of about 10 minutes: Very strange. Even your valid tasks show incomplete logs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=186915983 https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900508 https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900791 You may consider to stop your BOINC client, clean the VirtualBox environment and do a project reset before you start a fresh task. |
Send message Joined: 19 Feb 08 Posts: 707 Credit: 4,335,771 RAC: 30 ![]() ![]() |
Single core Atlas tasks on my slower Linux laptop complete with Hits file. Double core tasks on the Windows 10 PC complete in short times with no Hits file. Tullio |
![]() Send message Joined: 17 Sep 04 Posts: 88 Credit: 27,682,121 RAC: 4,354 ![]() ![]() ![]() |
again I had an ATLAS task with runtime of about 10 minutes: Is it this statement that indicates success? 2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job! Regards, Bob P. |
![]() Send message Joined: 15 Jun 08 Posts: 2139 Credit: 174,693,366 RAC: 98,814 ![]() ![]() ![]() |
... Is it this statement that indicates success? It depends on the perspective. "Guest Log: Successfully finished the ATLAS job!" indicates a successful end of the BOINC task and your host will most likely get credits. From the science perspective a file like "HITS.*" contains the results and should be included in the upload. This name normally appears in the directory listing that is included in the stderr.txt and can be checked there. |
![]() Send message Joined: 17 Sep 04 Posts: 88 Credit: 27,682,121 RAC: 4,354 ![]() ![]() ![]() |
... Is it this statement that indicates success? Thanks! Found it noted in 4 places, including twice here: 2018-04-11 01:37:31 (7056): Guest Log: HITS.13684178._016629.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/f3/ae/HITS.13684178._016629.pool.root.1:checksumtype=adler32:checksumvalue=b9f476d Regards, Bob P. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
Does one of the experts know why this task was unsuccessfull: https://lhcathome.cern.ch/lhcathome/result.php?resultid=188664284 stderr.txt Output: ***********************log_extracts.txt************************* - Last 10 lines from /home/boinc/slots/0/Panda_Pilot_27163_1525262542/PandaJob/athena_stdout.txt - PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO Validating data type EVNT... PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO Validating file EVNT.13620802._000441.pool.root.1... PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO EVNT.13620802._000441.pool.root.1: Testing event count... PyJobTransforms.trfArgClasses._readMetadata 2018-05-02 14:03:46,677 INFO Metadata generator called to obtain nentries for ('EVNT.13620802._000441.pool.root.1',) PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,202 INFO Event counting test passed (3000 events). PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO EVNT.13620802._000441.pool.root.1: Checking if guid exists... PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO Guid is 01F4D127-47BB-F04E-8417-8E95B617CCA7 PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO Stopping legacy (serial) file validation PyJobTransforms.transform.execute 2018-05-02 14:03:52,204 CRITICAL Transform executor raised TransformExecutionException: No events to process: 7800 (skipEvents) >= 3000 (inputEvents of EVNT PyJobTransforms.transform.execute 2018-05-02 14:03:55,359 WARNING Transform now exiting early with exit code 15 (No events to process: 7800 (skipEvents) >= 3000 (inputEvents of EVNT) - Walltime - JobRetrival=0, StageIn=13, Execution=42, StageOut=0, CleanUp=17 ***********************pilot_error_report.json********************* { "3915558433": { "2": [ { "pilotErrorCode": 0, "pilotErrorDiag": "Job failed: Non-zero failed job return code: 15" } ] } } |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,694,356 RAC: 43,567 ![]() ![]() ![]() |
Does one of the experts know why this task was unsuccessfull: ...for me, the key-words are "No events to process" - so, maybe, the task was somehow mis-configured |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
... so, maybe, the task was somehow mis-configuredthought the same, but there have not been mis-configured tasks for a long time (which obviously does not mean that it cant happen) |
Send message Joined: 2 May 07 Posts: 1715 Credit: 127,737,006 RAC: 256,791 ![]() ![]() ![]() |
10 Minutes work for Linux native-App: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=93126686 |
Send message Joined: 2 May 07 Posts: 1715 Credit: 127,737,006 RAC: 256,791 ![]() ![]() ![]() |
Boinc 7.8.3 Virtualbox 5.1.26 https://lhcathome.cern.ch/lhcathome/result.php?resultid=193407107 This Atlas finished in a short time, because of a unexpected Computer-stop and a rerunning. There is no new task starting from a other PC, so this task will never get a new running. Edit:2018-05-25 09:26:43 (10196): Guest Log: Starting ATLAS job. (PandaID=3939995415 taskID=14154968) |
Send message Joined: 14 Jan 10 Posts: 1154 Credit: 7,096,440 RAC: 1,190 ![]() ![]() ![]() |
There is no new task starting from a other PC, so this task will never get a new running. Not this BOINC-task got a resend, but the inside Job 3939995415 is reported as failed and re-issued as job 3942475539. |
©2023 CERN