Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 34880 - Posted: 5 Apr 2018, 14:33:53 UTC

I just had a 2-core ATLAS task the total runtime of which was 493,73 seconds, CPU time 173,11 seconds, yielding 3,95 credit points.
Never ever before I saw such a short ATLAS task - is this a new series now?
ID: 34880 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,873,502
RAC: 137,190
Message 34881 - Posted: 5 Apr 2018, 14:50:20 UTC - in response to Message 34880.  

Your logfile shows a couple of errors that have to be investigated by the project team.
David, are you aware of it?
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186673577

2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,895 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,897 INFO Now writing wrapper for substep executor EVNTtoHITS
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-04-05 16:20:55,897 INFO Valgrind not engaged
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.preExecute 2018-04-05 16:20:55,898 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.execute 2018-04-05 16:20:55,898 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.execute 2018-04-05 16:22:50,760 INFO EVNTtoHITS executor returns 139
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.validate 2018-04-05 16:22:51,775 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65)
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.trfExe.validate 2018-04-05 16:22:51,826 INFO Scanning logfile log.EVNTtoHITS for errors
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.transform.execute 2018-04-05 16:22:51,883 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139)
2018-04-05 16:25:19 (6668): Guest Log: PyJobTransforms.transform.execute 2018-04-05 16:22:55,002 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139))
ID: 34881 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 34884 - Posted: 5 Apr 2018, 15:44:11 UTC - in response to Message 34881.  

Your logfile shows a couple of errors that have to be investigated by the project team.
thanks for your efforts - I didn't even bother to look up the stderr, since the task got finished successfully.
ID: 34884 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 34887 - Posted: 5 Apr 2018, 18:21:53 UTC - in response to Message 34884.  

thanks for your efforts - I didn't even bother to look up the stderr, since the task got finished successfully.
successful for boinc standards, not successfull for atlas standards, since there is no HITS file
ID: 34887 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 34888 - Posted: 5 Apr 2018, 19:16:23 UTC - in response to Message 34887.  

successful for boinc standards, not successfull for atlas standards, since there is no HITS file
yes, unfortunately :-(
ID: 34888 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 34889 - Posted: 5 Apr 2018, 20:21:09 UTC

I noticed a few of failures like this in the new batch of tasks - "EVNTtoHITS got a SIGSEGV signal" means a crash of the ATLAS code so nothing you can do about it I'm afraid.

We still give credits for these kind of failures so we don't punish volunteers for problems in the ATLAS code, but as gyllic said, no HITS file means it was unsuccessful for ATLAS.
ID: 34889 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 34895 - Posted: 6 Apr 2018, 17:24:45 UTC
Last modified: 6 Apr 2018, 17:25:52 UTC

Ths same thing happens to me on my Windows 10 PC, Atlas tasks complete and validate in a very short time but no HITS file..
Tullio
ID: 34895 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,873,502
RAC: 137,190
Message 34896 - Posted: 6 Apr 2018, 18:06:30 UTC - in response to Message 34895.  

The recent ATLAS batch works on a large input file (>300MB).
This may be a bit too large for your local configuration.

You may try to increase the RAM setting for your VMs by a few 100 MB to 4600-4800 MB.
The relevant setting is located in your app_config.xml.

Hope this will be successful.
ID: 34896 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 34951 - Posted: 10 Apr 2018, 16:46:36 UTC

again I had an ATLAS task with runtime of about 10 minutes:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334
ID: 34951 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,873,502
RAC: 137,190
Message 34952 - Posted: 10 Apr 2018, 18:00:19 UTC - in response to Message 34951.  

again I had an ATLAS task with runtime of about 10 minutes:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334

Very strange.
Even your valid tasks show incomplete logs:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186915983
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900508
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900791

You may consider to stop your BOINC client, clean the VirtualBox environment and do a project reset before you start a fresh task.
ID: 34952 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 34953 - Posted: 11 Apr 2018, 2:42:11 UTC

Single core Atlas tasks on my slower Linux laptop complete with Hits file. Double core tasks on the Windows 10 PC complete in short times with no Hits file.
Tullio
ID: 34953 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,618,118
RAC: 3,938
Message 34955 - Posted: 11 Apr 2018, 11:47:11 UTC - in response to Message 34952.  

again I had an ATLAS task with runtime of about 10 minutes:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334

Very strange.
Even your valid tasks show incomplete logs:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186915983
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900508
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900791

You may consider to stop your BOINC client, clean the VirtualBox environment and do a project reset before you start a fresh task.

Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!
Regards,
Bob P.
ID: 34955 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,873,502
RAC: 137,190
Message 34956 - Posted: 11 Apr 2018, 12:03:01 UTC - in response to Message 34955.  

... Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!

It depends on the perspective.
"Guest Log: Successfully finished the ATLAS job!" indicates a successful end of the BOINC task and your host will most likely get credits.

From the science perspective a file like "HITS.*" contains the results and should be included in the upload.
This name normally appears in the directory listing that is included in the stderr.txt and can be checked there.
ID: 34956 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,618,118
RAC: 3,938
Message 34958 - Posted: 11 Apr 2018, 13:26:14 UTC - in response to Message 34956.  
Last modified: 11 Apr 2018, 13:27:00 UTC

... Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!

It depends on the perspective.
"Guest Log: Successfully finished the ATLAS job!" indicates a successful end of the BOINC task and your host will most likely get credits.

From the science perspective a file like "HITS.*" contains the results and should be included in the upload.
This name normally appears in the directory listing that is included in the stderr.txt and can be checked there.

Thanks! Found it noted in 4 places, including twice here:
2018-04-11 01:37:31 (7056): Guest Log: HITS.13684178._016629.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/f3/ae/HITS.13684178._016629.pool.root.1:checksumtype=adler32:checksumvalue=b9f476d

Regards,
Bob P.
ID: 34958 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35138 - Posted: 2 May 2018, 16:26:02 UTC

Does one of the experts know why this task was unsuccessfull:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=188664284

stderr.txt Output:
***********************log_extracts.txt*************************
- Last 10 lines from /home/boinc/slots/0/Panda_Pilot_27163_1525262542/PandaJob/athena_stdout.txt -
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO Validating data type EVNT...
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO Validating file EVNT.13620802._000441.pool.root.1...
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:46,676 INFO EVNT.13620802._000441.pool.root.1: Testing event count...
PyJobTransforms.trfArgClasses._readMetadata 2018-05-02 14:03:46,677 INFO Metadata generator called to obtain nentries for ('EVNT.13620802._000441.pool.root.1',)
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,202 INFO Event counting test passed (3000 events).
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO EVNT.13620802._000441.pool.root.1: Checking if guid exists...
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO Guid is 01F4D127-47BB-F04E-8417-8E95B617CCA7
PyJobTransforms.trfValidation.performStandardFileValidation 2018-05-02 14:03:52,203 INFO Stopping legacy (serial) file validation
PyJobTransforms.transform.execute 2018-05-02 14:03:52,204 CRITICAL Transform executor raised TransformExecutionException: No events to process: 7800 (skipEvents) >= 3000 (inputEvents of EVNT
PyJobTransforms.transform.execute 2018-05-02 14:03:55,359 WARNING Transform now exiting early with exit code 15 (No events to process: 7800 (skipEvents) >= 3000 (inputEvents of EVNT)

- Walltime -
JobRetrival=0, StageIn=13, Execution=42, StageOut=0, CleanUp=17

***********************pilot_error_report.json*********************
{
    "3915558433": {
        "2": [
            {
                "pilotErrorCode": 0,
                "pilotErrorDiag": "Job failed: Non-zero failed job return code: 15"
            }
        ]
    }
}
ID: 35138 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 35139 - Posted: 2 May 2018, 17:29:59 UTC - in response to Message 35138.  

Does one of the experts know why this task was unsuccessfull: ...
for me, the key-words are "No events to process" - so, maybe, the task was somehow mis-configured
ID: 35139 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35161 - Posted: 4 May 2018, 12:04:41 UTC - in response to Message 35139.  

... so, maybe, the task was somehow mis-configured
thought the same, but there have not been mis-configured tasks for a long time (which obviously does not mean that it cant happen)
ID: 35161 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,083,677
RAC: 105,711
Message 35221 - Posted: 10 May 2018, 14:20:17 UTC

ID: 35221 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,083,677
RAC: 105,711
Message 35369 - Posted: 25 May 2018, 7:47:34 UTC
Last modified: 25 May 2018, 7:49:21 UTC

Boinc 7.8.3 Virtualbox 5.1.26
https://lhcathome.cern.ch/lhcathome/result.php?resultid=193407107
This Atlas finished in a short time, because of a unexpected Computer-stop and a rerunning.
There is no new task starting from a other PC, so this task will never get a new running.
Edit:2018-05-25 09:26:43 (10196): Guest Log: Starting ATLAS job. (PandaID=3939995415 taskID=14154968)
ID: 35369 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 35373 - Posted: 25 May 2018, 13:37:12 UTC - in response to Message 35369.  

There is no new task starting from a other PC, so this task will never get a new running.
Edit:2018-05-25 09:26:43 (10196): Guest Log: Starting ATLAS job. (PandaID=3939995415 taskID=14154968)

Not this BOINC-task got a resend, but the inside Job 3939995415 is reported as failed and re-issued as job 3942475539.
ID: 35373 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ?


©2024 CERN