1) Message boards : Number crunching : Anonymous Platform - Scheduler request failed: HTTP internal server error (Message 38891)
Posted 18 May 2019 by Profile tazzduke
Post:
Moved
2) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 38582)
Posted 19 Apr 2019 by Profile tazzduke
Post:
Greetings Yeti

Thankyou for your updated checklist and all you and your teams efforts.

I have now managed to complete 2 x ATLAS jobs (using mt 4 cores per job).

I believe my earlier (okay way earlier) troubles, were caused by insufficient RAM in the computer, had 8 GB, now has 16 GB.

Yes both have the HITS file entry in the output.

Cheers
3) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36105)
Posted 28 Jul 2018 by Profile tazzduke
Post:
Greetings Yeti

Thankyou, you may have given me a lightbulb moment, I think I need to change to a different PC to handle this.

Current PC being used to run these tasks, also has programs that run in background to keep it child safe.

So I will switch over to a spare QUAD Core Win 7 x64 system, that just runs BOINC and does nothing else.

Then I will re read up on step 10 and howto's on opening up said ports.

Cheers.
4) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36102)
Posted 28 Jul 2018 by Profile tazzduke
Post:
Greetings

Just to update, I did a complete reinstall on VB and BOINC.

Did a reset of LHC and then downloaded one workunit to triy.

Then as I was watching it, two things I noticed.

1st thing - Nil CPU usage was indicated whilst workunit was running.

2nd thing - I was looking at the events log and nil entries were being recorded, IE when you show vm and do the ALT F2 on a workunit.

So then I checked the stderr of the workunit and it was not a normal valid workunit with a hits file ref.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=203310529

Cheers
5) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36088)
Posted 27 Jul 2018 by Profile tazzduke
Post:
Hi bronco

Yes BOINC finds it and reads it correctly, I was helped out by user Computezrmle in the number crunching thread.

Have used an app_config file on other projects ie SETI, Primegrid. :-)

Waiting for my machine to empty its workcache and going to start with a fresh install of VB and BOINC.

Then we will see how that goes.

Regards
6) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36087)
Posted 27 Jul 2018 by Profile tazzduke
Post:
Hi Yeti

Thankyou, currently emptying my workcache and then start all over again.

Regards
7) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36082)
Posted 27 Jul 2018 by Profile tazzduke
Post:
Hi All

Seems as I am not the only one who is completing workunits that have been marked as valid (at boinc level) but no HITS file is present.

This is an extract from my last workunit - https://lhcathome.cern.ch/lhcathome/result.php?resultid=200161943

2018-07-16 05:50:28 (7532): Guest Log: Starting ATLAS job. (PandaID=3983550564 taskID=14530897)
2018-07-16 06:08:08 (7532): Guest Log: log_extracts:
2018-07-16 06:08:08 (7532): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_3444_1531691438/PandaJob/athena_stdout.txt -
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,806 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Now writing wrapper for substep executor EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-07-15 23:57:31,808 INFO Valgrind not engaged
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-15 23:57:31,808 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-16 00:05:00,192 INFO EVNTtoHITS executor returns 139
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,628 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,679 INFO Scanning logfile log.EVNTtoHITS for errors
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:01,724 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:05,645 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details))

I have, reset my preferences for MAX# Jobs=1 and MAX# Cores=2, also have the app_config.xml file setting 4800mb for my 2 core cpu workunit.

I might try and start again, by first finding another user who is validating with a hits file who is using Win 7 x64 and seeing which version of VB and BOINC they are using as well.

Might also need to do some re reading as well.

Regards
8) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35934)
Posted 16 Jul 2018 by Profile tazzduke
Post:
Understood

Regards
9) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35931)
Posted 16 Jul 2018 by Profile tazzduke
Post:
Hi All

Seems as I am not the only one who is completing workunits that have been marked as valid (at boinc level) but no HITS file is present.

As per the following workunits

https://lhcathome.cern.ch/lhcathome/result.php?resultid=200171936
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200168456
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200077351

I am going to try a later version of VB and also this is an extract from my last workunit - https://lhcathome.cern.ch/lhcathome/result.php?resultid=200161943

2018-07-16 05:50:28 (7532): Guest Log: Starting ATLAS job. (PandaID=3983550564 taskID=14530897)
2018-07-16 06:08:08 (7532): Guest Log: log_extracts:
2018-07-16 06:08:08 (7532): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_3444_1531691438/PandaJob/athena_stdout.txt -
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,806 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Now writing wrapper for substep executor EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-07-15 23:57:31,808 INFO Valgrind not engaged
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-15 23:57:31,808 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-16 00:05:00,192 INFO EVNTtoHITS executor returns 139
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,628 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,679 INFO Scanning logfile log.EVNTtoHITS for errors
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:01,724 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:05,645 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details))

Regards
10) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35926)
Posted 16 Jul 2018 by Profile tazzduke
Post:
Hi All
Found some very peculiar lines in the stderr output files for the latest two tasks after making above changes.
When I get access to my machine I will provide an update.
Still no hits file, so investigation continues.
Regards.
11) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35922)
Posted 15 Jul 2018 by Profile tazzduke
Post:
Greetings All

Thankyou for the feedback, I changed my app_config file to the one as per previous reply from computerzmle.

After the changes, I only downloaded two more workunits to see if I am on the right track.

Regards
12) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35918)
Posted 15 Jul 2018 by Profile tazzduke
Post:
If they are as you say failing why are then not marked invalid.

I had a hiccup or two at the start but have set up my project preferences for max cpu 2 and max jobs 2.

I have an app config file that is saying use 2 cores per job.

Boinc indicates I am using 2 cores for each job.

I only run one job at a time.

Latest valids are stating in the task file successfully completed?

A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply.

Regards
13) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35903)
Posted 15 Jul 2018 by Profile tazzduke
Post:
Greetings All

Thankyou for the checklist Yeti, have worked through it and have three PCs successfully crunching ATLAS workunits.

On my machines I am utilizing MT 2 core per job and only doing one job at time, this is due to ram limitations on my PCs.

I am working on the 4th machine at the moment.

I do apologize for aborting/erroring some workunits, misread one of the steps and also incomplete/corrupted download.

Cheers
14) Message boards : News : Status and Plans, Saturday 29th September, 2012 (Message 24889)
Posted 10 Oct 2012 by Profile tazzduke
Post:
I would love to get my hands on some of these monster wu's, am with Snow Crash on this one

Regards
Tazzduke
15) Message boards : News : Very long jobs (Message 24716)
Posted 25 Aug 2012 by Profile tazzduke
Post:
To Eric and CERN team

All I can say is keep up the good work

Also I got some of those wlxu2 workunits, which for my machine are not a problem, as I crunch one project at a time and considering the workunits available I will be on crunching for awhile :-)

Regards
Tazzduke
16) Message boards : News : Very long jobs (Message 24697)
Posted 23 Aug 2012 by Profile tazzduke
Post:
In further to my last post, I have set to NNT until I have crunched these 7 long workunits.

Regards
Tazzduke
17) Message boards : News : Very long jobs (Message 24696)
Posted 23 Aug 2012 by Profile tazzduke
Post:
Greetings

Have picked up 7 long ones and crunching at a good pace, glad to see the work has picked up

Regards
Tazzduke
18) Message boards : Number crunching : No start tag! (Message 23031)
Posted 15 Sep 2011 by Profile tazzduke
Post:
Greetings, did the detach and reattach and am now receiving work, thank you for the prompt action on this problem.

Regards
19) Message boards : Number crunching : No start tag! (Message 23028)
Posted 15 Sep 2011 by Profile tazzduke
Post:
Thanks for the update :-)

Regards
20) Message boards : Number crunching : No start tag! (Message 23025)
Posted 15 Sep 2011 by Profile tazzduke
Post:
Good Evening

Saw that there was work available, so I fired up BOINC and connected using the url on the home page. The below is the messages I get

Hope there is a solution

Regards


Thu 15 Sep 2011 06:34:48 PM WST | LHC@home 1.0 | Sending scheduler request: Project initialization.
Thu 15 Sep 2011 06:34:48 PM WST | LHC@home 1.0 | Requesting new tasks for CPU
Thu 15 Sep 2011 06:34:51 PM WST | LHC@home 1.0 | Scheduler request completed: got 0 new tasks
Thu 15 Sep 2011 06:34:51 PM WST | LHC@home 1.0 | Error in request message: fgets() failed



©2020 CERN