Message boards :
ATLAS application :
All tasks in error
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Sep 14 Posts: 25 Credit: 723,818 RAC: 0 |
While everything worked fine since 2 days all tasks are in error fairly quickly unless there was a change on the machine (mac) 2017-04-03 08:28:06 (1881): Guest Log: log_extracts: 2017-04-03 08:28:06 (1881): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_5966_1491200476/PandaJob_3312154893_1491200483/athena_stdout.txt - 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.preExecute 2017-04-03 08:23:16,519 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.preExecute 2017-04-03 08:23:16,521 INFO Now writing wrapper for substep executor EVNTtoHITS 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2017-04-03 08:23:16,521 INFO Valgrind not engaged 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.preExecute 2017-04-03 08:23:16,521 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh'] 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.execute 2017-04-03 08:23:16,521 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.execute 2017-04-03 08:25:45,366 INFO EVNTtoHITS executor returns 33 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.validate 2017-04-03 08:25:46,379 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (33) (Error code 65) 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.trfExe.validate 2017-04-03 08:25:46,423 INFO Scanning logfile log.EVNTtoHITS for errors 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.transform.execute 2017-04-03 08:25:46,450 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: "IOVDbSvc FATAL Conditions database connection COOLOFL_TRT/OFLP200 cannot be opened - STOP" 2017-04-03 08:28:06 (1881): Guest Log: PyJobTransforms.transform.execute 2017-04-03 08:25:49,655 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: "IOVDbSvc FATAL Conditions database connection COOLOFL_TRT/OFLP200 cannot be opened - STOP") 2017-04-03 08:28:06 (1881): Guest Log: - Walltime - an idea ??? |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
The problem is due to database servers used by all ATLAS tasks (on the whole ATLAS grid, not just ATLAS@Home) are overloaded and tasks are failing to connect to them to download the necessary information. This happened from time to time on the old project and this is the first time it happened since the consolidation to LHC. The experts are working on fixing it, more news soon. |
Send message Joined: 21 Sep 14 Posts: 25 Credit: 723,818 RAC: 0 |
Thanks for your quick reply :) |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,752,589 RAC: 122,071 |
after several WUs had functioned well this afternoon, there was one which again failed after 10 minutes: https://lhcathome.cern.ch/lhcathome/result.php?resultid=131789181 The next one then was okay. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
there was one which again failed after 10 minutes It looks like a different issue though. Those that failed starting yesterday had the error: "IOVDbSvc FATAL Conditions database connection COOLOFL_TRT/OFLP200 cannot be opened - STOP" The task you mentioned failed with this error: "DetectorStore FATAL in sysInitialize(): standard std::exception is caught" We are the product of random evolution. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,752,589 RAC: 122,071 |
Ah, thanks for the hint. I didn't catch that (having been in a hurry, I did not look up the log carefully enough) From my experience with ATLAS so far, there seem to be 100 or 1000 different reasons which can cause a task to fail :-( |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
From my experience with ATLAS so far, there seem to be 100 or 1000 different reasons which can cause a task to fail :-( Yeap, when I joined in December last year I did not expect that volunteering for ATLAS and LHC would be so time consuming... But I won't give up for a penny. Instead I am planning on buying a 6950X soon ;). Well, I must admit that I'll be using those machines for my own calculations as well. We are the product of random evolution. |
Send message Joined: 27 Sep 08 Posts: 807 Credit: 652,242,820 RAC: 285,643 |
Herve, the Xeon's get better RAC for lower price if the low number core performance outside BOINC is not a consideration. https://lhcathome.cern.ch/lhcathome/hosts_user.php?userid=129087 For the same price E5-2680v4 has 14 Cores with clocks of 2.4-2.9-3.3Ghz, you can max this project with 24 tasks at once. I would expect around 50% higher RAC based on my computers. My old E5-2683 @ 2-2.5-3Ghz scores better. I would recommend 64GB, I don't see problems with 24 concurrent WU's on my computers running only single core tasks. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
For the same price E5-2680v4 has 14 Cores with clocks of 2.4-2.9-3.3Ghz, you can max this project with 24 tasks at once. Thanks Toby for the suggestion. I remember reading a thread on the ATLAS@Home where the performance of various CPUs were compared and thought of going for Xeon as well. Then I checked the specialised shops in Dubai and nobody sells Xeon processors around here, not to mention motherboards. So I made up my mind for the Core i7-6950 which can easily be found in Dubai. We are the product of random evolution. |
Send message Joined: 27 Sep 08 Posts: 807 Credit: 652,242,820 RAC: 285,643 |
The boards are easy all(?) X99 boards support them. Finding them is more tricky I got some of mine from general computer stores online. I'm sure you'll like, I would say it the most Xeon like of all consumer processors, albeit for a crazy price. |
©2024 CERN