Message boards : ATLAS application : New app version 1.01
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 29171 - Posted: 10 Mar 2017, 15:38:42 UTC

We just released a new app version and are submitting entirely new tasks for ATLAS. These tasks have a new ATLAS software version (version 21 instead of version 19 for those interested) and the WU that will run here are part of a huge simulation campaign running over the whole ATLAS Grid for the next several months.

These tasks process 100 events instead of the 50 we had before but the input and output sizes as well as the running time are roughly the same as before. This is due to the different types of events in these tasks.

As stated earlier, we will run these new tasks only on LHC@Home and will not submit any more to ATLAS@Home. Once we have a few results and they look ok I will move the app out of beta and encourage everyone to move here.

Have a nice weekend!
ID: 29171 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29173 - Posted: 10 Mar 2017, 17:16:15 UTC

I've gone directly to Atlas@home to avoid loading tasks which do not run on my Windows 10 PC. The 2 single core are version 2.01 and the 2 double core are version 1.04.
Tullio
ID: 29173 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29174 - Posted: 10 Mar 2017, 17:19:40 UTC - in response to Message 29173.  

I've gone directly to Atlas@home to avoid loading tasks which do not run on my Windows 10 PC. The 2 single core are version 2.01 and the 2 double core are version 1.04.
Tullio

Perhaps you should re-think to disable the "Beta-Applications"


Supporting BOINC, a great concept !
ID: 29174 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,587,045
RAC: 9,114
Message 29178 - Posted: 10 Mar 2017, 18:55:38 UTC

Just tried one on Linux. Task ran for about 20 minutes then got this:

2017-03-10 12:49:18 (8776): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_5904_1489171051/PandaJob_3273309522_1489171055/athena_stdout.txt -
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,950 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Now writing wrapper for substep executor EVNTtoHITS
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2017-03-10 12:38:27,952 INFO Valgrind not engaged
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:38:27,952 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:46:25,442 INFO EVNTtoHITS executor returns 65
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,351 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (65) (Error code 65)
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,365 INFO Scanning logfile log.EVNTtoHITS for errors
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:26,588 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider"
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:29,792 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider")

Task number 124796186

Let me know if you need more info.
ID: 29178 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29179 - Posted: 10 Mar 2017, 19:19:14 UTC

ID: 29179 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29180 - Posted: 10 Mar 2017, 20:05:11 UTC - in response to Message 29174.  
Last modified: 10 Mar 2017, 20:10:11 UTC


Perhaps you should re-think to disable the "Beta-Applications"

Atlas tasks and SixTrack tasks are the only one running under the LHC umbrella.
Tullio
I've been running Test4Theory@home tasks since November 2010. My ID was 10
ID: 29180 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29181 - Posted: 10 Mar 2017, 20:18:08 UTC

Atlas tasks and SixTrack tasks are the only one running under the LHC umbrella.

On one of my machines, I have been trying CMS, Theory and LHCb over the past few days, and none are working. Only SixTrack runs OK. And the new ATLAS 1.01 systematically gives "Validation error" due to the Exception mentioned above.
Strangely enough, on my other machine, Theory works fine, but I have not managed to understand why.
We are the product of random evolution.
ID: 29181 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29182 - Posted: 11 Mar 2017, 1:18:51 UTC - in response to Message 29181.  

I am glad I am not the only one to have problems. I used to run Theory tasks on my 32-Bit Linux laptop, now they all fail with kernel panic.
Tullio
ID: 29182 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29187 - Posted: 11 Mar 2017, 11:23:30 UTC

For me (only Windows-PCs) all looks fine. So far no errors, no problems


Supporting BOINC, a great concept !
ID: 29187 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29188 - Posted: 11 Mar 2017, 14:49:29 UTC - in response to Message 29187.  

For me (only Windows-PCs) all looks fine. So far no errors, no problems


As far as I have checked your valid tasks were still all version 1.00 and not the new version 1.01.
ID: 29188 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29189 - Posted: 11 Mar 2017, 14:52:44 UTC - in response to Message 29188.  

For me (only Windows-PCs) all looks fine. So far no errors, no problems


As far as I have checked your valid tasks were still all version 1.00 and not the new version 1.01.

Nope, there are already several 1.01 tasks between, it is only a little bit tricky to find them between all these 1.00

Can you see this: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10359162

otherwise take this: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60042972


Supporting BOINC, a great concept !
ID: 29189 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 29190 - Posted: 11 Mar 2017, 15:19:22 UTC

I also have no problems running these task (on Linux). But it seems that someone has cancelled these tasks upstream and that's why there are no more in the queue. I'm not sure why but it's unlikely there will be new tasks before Monday.
ID: 29190 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29191 - Posted: 11 Mar 2017, 16:57:32 UTC
Last modified: 11 Mar 2017, 16:59:09 UTC

I just got another ATLAS Simulation 1.01 task today. If you look at the WU itself, you will see that it has failed on 4 different computers, and lastly on mine:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60026571
All 4 failed on the same Exception.
ID: 29191 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29192 - Posted: 11 Mar 2017, 18:55:25 UTC - in response to Message 29189.  

For me (only Windows-PCs) all looks fine. So far no errors, no problems


As far as I have checked your valid tasks were still all version 1.00 and not the new version 1.01.

Nope, there are already several 1.01 tasks between, it is only a little bit tricky to find them between all these 1.00

That's good news Yeti, that there are at least some people with valid tasks for the new version 1.01.,
but maybe your machines are so well-trained doing ATLAS-jobs, that they even crunch them asleep ;)

When new ATLAS-tasks are available I'll give it another try.
ID: 29192 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29193 - Posted: 11 Mar 2017, 19:14:15 UTC - in response to Message 29192.  

that they even crunch them asleep ;)
They are never allowed to sleep !

When new ATLAS-tasks are available I'll give it another try.

Perhaps you should work through my "old" checklist. It was designed for Atlas as a Stand-Alone project but it could help you to get it working

If not, you are welcome to ask for help


Supporting BOINC, a great concept !
ID: 29193 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29194 - Posted: 11 Mar 2017, 19:28:51 UTC

I saw that a new set of tasks has been added to ATLAS@Home, contrary to what I understood of the statement from David at the beginning of this thread:
we will run these new tasks only on LHC@Home and will not submit any more to ATLAS@Home

So maybe the team at LHC realised that the version 1.01 is not sufficiently stable yet.
ID: 29194 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29195 - Posted: 11 Mar 2017, 21:49:35 UTC - in response to Message 29193.  

Perhaps you should work through my "old" checklist. It was designed for Atlas as a Stand-Alone project but it could help you to get it working

If not, you are welcome to ask for help

I know your "ausführlicher" checklist. I never have a problem with any VM-based BOINC project to get it running, also not ATLAS version 1.00.
It's just version 1.01 completing successful for BOINC, but then the validator turns them into an error, what's OK, cause they run very short (~10-20 minutes or so).

Maybe that were just corrupt workunits, but this error doesn't look good:

ERROR Validation of return code failed: EVNTtoHITS got a SIGKILL signal (exit code 137) (Error code 65)
ID: 29195 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29196 - Posted: 11 Mar 2017, 23:28:37 UTC - in response to Message 29195.  

I never have a problem with any VM-based BOINC project to get it running, also not ATLAS version 1.00.
It's just version 1.01 completing successful for BOINC, but then the validator turns them into an error, what's OK, cause they run very short (~10-20 minutes or so).

10 to 20 Minutes runtime is a timeframe that inside the VM an external webservice (of CERN) can not be reached. Most of the time the cause is a company or personal firewall. One or more ports may be blocked.

Please check here: http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use or Number 10 from my checklist


Supporting BOINC, a great concept !
ID: 29196 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29197 - Posted: 11 Mar 2017, 23:42:48 UTC - in response to Message 29196.  
Last modified: 11 Mar 2017, 23:43:22 UTC

By the way, the other side is also a possible Point of Error, if one of the external servers has a problem this can result also with crunching Time 10 to 20 minutes

Just checked my results again, I see 6 WUs that had a timeframe of 10 to 20 minutes then they were finished (and not validated), but more than 60 WUs have done fine and got validated


Supporting BOINC, a great concept !
ID: 29197 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29198 - Posted: 12 Mar 2017, 3:29:53 UTC
Last modified: 12 Mar 2017, 3:44:51 UTC

I started to have the TransformValidationException exception on ATLAS@Home as well, so maybe the problem is not with the new version 1.01 of ALTAS Simulator, but with the data or a new server used by that data. But in the logs I cannot see any failed ping or new port being used.
The same problem occurred on my 2 machines:
http://atlasathome.cern.ch/result.php?resultid=8503011
http://atlasathome.cern.ch/result.php?resultid=8500584

Yeti, you could check this task on one of your computers: http://atlasathome.cern.ch/result.php?resultid=8503848
We are the product of random evolution.
ID: 29198 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : ATLAS application : New app version 1.01


©2024 CERN