Message boards :
ATLAS application :
Validate error on all tasks, and short run time with 1 core only
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 12 Aug 06 Posts: 430 Credit: 11,643,622 RAC: 12,397 ![]() ![]() ![]() |
Atlas used to work on all my computers. I just upgraded two of them with extra RAM. They're dual Xeon machines which now have 36GB instead of 20GB of RAM. But I'm getting this: https://lhcathome.cern.ch/lhcathome/results.php?userid=55945&offset=0&show_names=0&state=5&appid=14 - all the tasks are only using 1 CPU core instead of 8, and finishing within 15 minutes, then causing a validate error. Any way to find out what's wrong? All I changed was adding more RAM (which has been tested by Memtest). Looking at one of the task logs, I see this from https://lhcathome.cern.ch/lhcathome/result.php?resultid=275499071: ***** 2020-05-27 23:12:03 (3224): Guest Log: *** Error codes and diagnostics *** 2020-05-27 23:12:03 (3224): Guest Log: "exeErrorCode": 65, 2020-05-27 23:12:03 (3224): Guest Log: "exeErrorDiag": "Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: \"DetectorStore FATAL in sysInitialize(): standard std::exception is caught\"", 2020-05-27 23:12:03 (3224): Guest Log: "pilotErrorCode": 1165, 2020-05-27 23:12:03 (3224): Guest Log: "pilotErrorDiag": "Local output file is missing", ***** ![]() |
![]() Send message Joined: 12 Aug 06 Posts: 430 Credit: 11,643,622 RAC: 12,397 ![]() ![]() ![]() |
Same problem on my other computer which has not changed apart from upgrading Virtualbox and extensions to latest version. I shall cease Atlas tasks until someone tells me what's happened. ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2608 Credit: 262,902,612 RAC: 142,374 ![]() ![]() |
... until someone tells me what's happened. Just look around and read other posts. The answer might already be there: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5438&postid=42630 |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 ![]() ![]() |
As you can read in the above link, there was a major database outage at CERN yesterday evening which affected BOINC servers and pretty much all of ATLAS' distributed computing services. Unfortunately one of the last things to come back were the Frontier database servers which the ATLAS tasks read data from as they are running. So although we were able to submit tasks here, they would all fail straight away. Now things should be working ok, sorry for the inconvenience. |
![]() Send message Joined: 12 Aug 06 Posts: 430 Credit: 11,643,622 RAC: 12,397 ![]() ![]() ![]() |
... until someone tells me what's happened. I did search first, but the search function in these forums isn't the best, and I was looking for a specific Atlas problem. It never dawned on me to link this problem to the outage I noticed when I couldn't even get on the forums yesterday. As you can read in the above link, there was a major database outage at CERN yesterday evening which affected BOINC servers and pretty much all of ATLAS' distributed computing services. Unfortunately one of the last things to come back were the Frontier database servers which the ATLAS tasks read data from as they are running. So although we were able to submit tasks here, they would all fail straight away. Now things should be working ok, sorry for the inconvenience. No problem, I just thought I was doing something wrong, I didn't want to throw back hundreds of useless results. And now processing with 8 cores per Atlas task, so I assume all is ok. ![]() |
![]() Send message Joined: 28 Sep 04 Posts: 748 Credit: 52,185,701 RAC: 38,167 ![]() ![]() ![]() |
During the night and this morning I've had quite a lot of invalid tasks (>25). All those tasks have failed with other hosts as well. The error for all of them seems to be 'Error: Service 'control' failed to initialize: VERR_INVALID_PARAMETER'. Here's one of them: https://lhcathome.cern.ch/lhcathome/result.php?resultid=277054761 ![]() |
![]() Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 ![]() ![]() |
Since 2 hours i get nothing but validation errors on ATLAS native tasks!! The problem seems to be: "pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n" There are a lot of other hosts out there with failing tasks! Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
![]() Send message Joined: 28 Sep 04 Posts: 748 Credit: 52,185,701 RAC: 38,167 ![]() ![]() ![]() |
Now all my Atlas tasks (windows virtual box) are failing with the error I posted before. I put Atlas on hold for a while. ![]() |
![]() Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 ![]() ![]() |
In addition to the validating errors on ATLAS i have now troubles getting other LHC workunits. BOINC tells me: Fr 12 Jun 2020 20:04:56 CEST | LHC@home | Scheduler request failed: HTTP gateway timeout Uploading results seems to be fine. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
![]() Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 ![]() ![]() |
In addition to the validating errors on ATLAS i have now troubles getting other LHC workunits.I had trouble too until I read LHC BOINC Messages and saw that no CPU was requested because queue was full and none needed. I suspended unstarted WUs and LHC immediately DLed a boatload. Hopefully this batch won't fail instantly. Edit: Not looking good: Valids zero, Invalids 73. Validation error. |
![]() Send message Joined: 11 Sep 05 Posts: 2 Credit: 275,738 RAC: 0 ![]() ![]() |
Validate error on all my ATLAS tasks today. |
Send message Joined: 18 Dec 15 Posts: 1843 Credit: 126,992,948 RAC: 132,892 ![]() ![]() ![]() |
same here, on 2 machines so far: 2020-06-13 12:00:42 (10588): Guest Log: "pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n" for more information: https://lhcathome.cern.ch/lhcathome/result.php?resultid=277530075 |
Send message Joined: 18 Dec 15 Posts: 1843 Credit: 126,992,948 RAC: 132,892 ![]() ![]() ![]() |
same here, on 2 machines so far:I now tried ATLAS on a third computer - same problem, the task failed after 12 minutes :-( see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=277581917 can anyone tell me what causes this problem? |
Send message Joined: 18 Nov 17 Posts: 131 Credit: 56,774,657 RAC: 7,836 ![]() ![]() ![]() |
Hello. Is ATLAS running fine now? |
Send message Joined: 14 Jan 10 Posts: 1441 Credit: 9,664,131 RAC: 1,300 ![]() ![]() |
The last time I tried ATLAS was 5 days ago and that task ran fine. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 ![]() ![]() |
I have six Atlas tasks running on a new PC with an Intel i5 CPU. I had ordered a HP desktop with an AMD Ryzen 5 3500 CPU, which would have given me 8 cores but they sent me an Intel CPU. It's a long time from the Intel PII Deschutes i had used in the Nineties. It has a 8 GB RAM and I brought it to 12 GB putting a second 4 GB RAM on the second slot. It has a 128 GB SSD disk plus a 1 TB hard disk. Wait and see. Tullio |
©2025 CERN