Message boards :
ATLAS application :
New app version 1.01
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 ![]() ![]() |
We just released a new app version and are submitting entirely new tasks for ATLAS. These tasks have a new ATLAS software version (version 21 instead of version 19 for those interested) and the WU that will run here are part of a huge simulation campaign running over the whole ATLAS Grid for the next several months. These tasks process 100 events instead of the 50 we had before but the input and output sizes as well as the running time are roughly the same as before. This is due to the different types of events in these tasks. As stated earlier, we will run these new tasks only on LHC@Home and will not submit any more to ATLAS@Home. Once we have a few results and they look ok I will move the app out of beta and encourage everyone to move here. Have a nice weekend! |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 ![]() ![]() |
I've gone directly to Atlas@home to avoid loading tasks which do not run on my Windows 10 PC. The 2 single core are version 2.01 and the 2 double core are version 1.04. Tullio |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
|
Send message Joined: 21 Jun 10 Posts: 39 Credit: 9,610,352 RAC: 1,859 ![]() ![]() ![]() |
Just tried one on Linux. Task ran for about 20 minutes then got this: 2017-03-10 12:49:18 (8776): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_5904_1489171051/PandaJob_3273309522_1489171055/athena_stdout.txt - 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,950 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Now writing wrapper for substep executor EVNTtoHITS 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2017-03-10 12:38:27,952 INFO Valgrind not engaged 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh'] 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:38:27,952 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:46:25,442 INFO EVNTtoHITS executor returns 65 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,351 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (65) (Error code 65) 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,365 INFO Scanning logfile log.EVNTtoHITS for errors 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:26,588 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider" 2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:29,792 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider") Task number 124796186 Let me know if you need more info. |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
Same exception here. Here are sample tasks: https://lhcathome.cern.ch/lhcathome/result.php?resultid=124785531 https://lhcathome.cern.ch/lhcathome/result.php?resultid=124785164 https://lhcathome.cern.ch/lhcathome/result.php?resultid=124785219 We are the product of random evolution. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 ![]() ![]() |
Atlas tasks and SixTrack tasks are the only one running under the LHC umbrella. Tullio I've been running Test4Theory@home tasks since November 2010. My ID was 10 |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
Atlas tasks and SixTrack tasks are the only one running under the LHC umbrella. On one of my machines, I have been trying CMS, Theory and LHCb over the past few days, and none are working. Only SixTrack runs OK. And the new ATLAS 1.01 systematically gives "Validation error" due to the Exception mentioned above. Strangely enough, on my other machine, Theory works fine, but I have not managed to understand why. We are the product of random evolution. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 ![]() ![]() |
I am glad I am not the only one to have problems. I used to run Theory tasks on my 32-Bit Linux laptop, now they all fail with kernel panic. Tullio |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
|
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,523,740 RAC: 5,910 ![]() ![]() ![]() |
For me (only Windows-PCs) all looks fine. So far no errors, no problems As far as I have checked your valid tasks were still all version 1.00 and not the new version 1.01. |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
For me (only Windows-PCs) all looks fine. So far no errors, no problems Nope, there are already several 1.01 tasks between, it is only a little bit tricky to find them between all these 1.00 Can you see this: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10359162 otherwise take this: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60042972 ![]() Supporting BOINC, a great concept ! |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 ![]() ![]() |
I also have no problems running these task (on Linux). But it seems that someone has cancelled these tasks upstream and that's why there are no more in the queue. I'm not sure why but it's unlikely there will be new tasks before Monday. |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
I just got another ATLAS Simulation 1.01 task today. If you look at the WU itself, you will see that it has failed on 4 different computers, and lastly on mine: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60026571 All 4 failed on the same Exception. |
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,523,740 RAC: 5,910 ![]() ![]() ![]() |
For me (only Windows-PCs) all looks fine. So far no errors, no problems That's good news Yeti, that there are at least some people with valid tasks for the new version 1.01., but maybe your machines are so well-trained doing ATLAS-jobs, that they even crunch them asleep ;) When new ATLAS-tasks are available I'll give it another try. |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
that they even crunch them asleep ;)They are never allowed to sleep ! When new ATLAS-tasks are available I'll give it another try. Perhaps you should work through my "old" checklist. It was designed for Atlas as a Stand-Alone project but it could help you to get it working If not, you are welcome to ask for help ![]() Supporting BOINC, a great concept ! |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
I saw that a new set of tasks has been added to ATLAS@Home, contrary to what I understood of the statement from David at the beginning of this thread: we will run these new tasks only on LHC@Home and will not submit any more to ATLAS@Home So maybe the team at LHC realised that the version 1.01 is not sufficiently stable yet. |
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,523,740 RAC: 5,910 ![]() ![]() ![]() |
Perhaps you should work through my "old" checklist. It was designed for Atlas as a Stand-Alone project but it could help you to get it working I know your "ausführlicher" checklist. I never have a problem with any VM-based BOINC project to get it running, also not ATLAS version 1.00. It's just version 1.01 completing successful for BOINC, but then the validator turns them into an error, what's OK, cause they run very short (~10-20 minutes or so). Maybe that were just corrupt workunits, but this error doesn't look good: ERROR Validation of return code failed: EVNTtoHITS got a SIGKILL signal (exit code 137) (Error code 65) |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
I never have a problem with any VM-based BOINC project to get it running, also not ATLAS version 1.00. 10 to 20 Minutes runtime is a timeframe that inside the VM an external webservice (of CERN) can not be reached. Most of the time the cause is a company or personal firewall. One or more ports may be blocked. Please check here: http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use or Number 10 from my checklist ![]() Supporting BOINC, a great concept ! |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,874,697 RAC: 66,475 ![]() ![]() ![]() |
By the way, the other side is also a possible Point of Error, if one of the external servers has a problem this can result also with crunching Time 10 to 20 minutes Just checked my results again, I see 6 WUs that had a timeframe of 10 to 20 minutes then they were finished (and not validated), but more than 60 WUs have done fine and got validated ![]() Supporting BOINC, a great concept ! |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
I started to have the TransformValidationException exception on ATLAS@Home as well, so maybe the problem is not with the new version 1.01 of ALTAS Simulator, but with the data or a new server used by that data. But in the logs I cannot see any failed ping or new port being used. The same problem occurred on my 2 machines: http://atlasathome.cern.ch/result.php?resultid=8503011 http://atlasathome.cern.ch/result.php?resultid=8500584 Yeti, you could check this task on one of your computers: http://atlasathome.cern.ch/result.php?resultid=8503848 We are the product of random evolution. |
©2023 CERN