Message boards :
ATLAS application :
Last days a lot of validate errors or No Hits file produced
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,924,084 RAC: 1,303 ![]() ![]() |
I am wondering that no one over there by now has noticed that all the tasks which were sent out within the recent past are faulty. How can this be? And again yesterday and today. I had a page worth of ATLAS all come up as invalid. Here is an example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=417650730 This was not resent to anyone. I also notice the quorum is set to 1 on this and all the rest. I also had a page full crash on me. But the quorum on those was set to 1. I can't make out if it was my system or the task that crashed. But I do know for the past few days I have been getting a dead computer (still running, but black). Not sure if it was this stuff causing that problem or something windows. In any case I uninstalled Vbox and installed it fresh again. Will see if that helps. |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,618,700 RAC: 2,133 ![]() ![]() ![]() |
I still have the same issue, also with the current batch of tasks. Only validate errors or valid, but no HITS-file produced. Most of the time no events are processed, but very rarely a task achieves to start event processing like this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=417953594 But after about 100 events from the 250 to do the task suddenly stopped; |
Send message Joined: 18 Dec 15 Posts: 1838 Credit: 121,990,104 RAC: 92,614 ![]() ![]() ![]() |
I now checked the tasks from my hosts which have run Atlas in the past few days (some other hosts are crunching Theory). In most cases a HITS file was produced, there were just 2 or 3 where it said "no HITS file produced". |
Send message Joined: 3 Nov 12 Posts: 61 Credit: 145,649,139 RAC: 117,285 ![]() ![]() ![]() |
Atlas native has been going well since December 8th... nonstop. |
Send message Joined: 17 Oct 06 Posts: 89 Credit: 57,479,053 RAC: 9,771 ![]() ![]() ![]() |
Has anyone else been able to run more than one atlas task at a time? The instant I do I end up with the yellow hard drive triangle in virtual box and all my other atlas tasks fail with computation errors until I clean it up. |
Send message Joined: 18 Dec 15 Posts: 1838 Credit: 121,990,104 RAC: 92,614 ![]() ![]() ![]() |
Has anyone else been able to run more than one atlas task at a time? The instant I do I end up with the yellow hard drive triangle in virtual box and all my other atlas tasks fail with computation errors until I clean it up.on several of my hosts I run more than one Atlas task at a time. So no idea what exactly might be the problem on your host, but it looks like some misconfiguration of the VirtualBox. Which version are you running? Maybe an update to a newer one might help. |
Send message Joined: 18 Dec 15 Posts: 1838 Credit: 121,990,104 RAC: 92,614 ![]() ![]() ![]() |
Atlas native has been going well since December 8th... nonstop.what I also notice with the latest Atlas tasks: console 2 and 3 are working again the same way they used to long time ago. |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,574,994 RAC: 119,363 ![]() ![]() |
CloverField wrote: Has anyone else been able to run more than one atlas task at a time? ... Unlike CMS/Theory ATLAS vbox still runs an older vboxwrapper where this bug is not fixed. The CERN BOINC team is aware but it looks like nobody from the ATLAS team wants to create a fresh app_version. Workaround: - clean the VirtualBox media registry - start a single ATLAS task - once the vdi is registered start other ATLAS tasks Erich56 wrote: ... console 2 and 3 are working again ... They work for "run 2" tasks but don't for "run 3" tasks since the logfile structure has changed. So far the recent tasks are "run 2" tasks. "Run 3" was the last major change before David Cameron left CERN. The required changes are not too complicated, but ... (same as above). |
Send message Joined: 17 Oct 06 Posts: 89 Credit: 57,479,053 RAC: 9,771 ![]() ![]() ![]() |
Has anyone else been able to run more than one atlas task at a time? The instant I do I end up with the yellow hard drive triangle in virtual box and all my other atlas tasks fail with computation errors until I clean it up.on several of my hosts I run more than one Atlas task at a time. So no idea what exactly might be the problem on your host, but it looks like some misconfiguration of the VirtualBox. Which version are you running? Maybe an update to a newer one might help. So I went and updated virtual box a couple of weeks ago to see if it would fix the issue. I'm running virtual box 7.1.4. I'm just baffled because it would run multiple tasks happily until about a month ago. |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,574,994 RAC: 119,363 ![]() ![]() |
... I'm running virtual box 7.1.4 ... This version runs fine. The reason why it sometimes fails is a race condition when you had no ATLAS tasks and then start at least 2 of them concurrently. If you are lucky the timings do not cause the race condition and everything works fine. Otherwise the media registry gets corrupted and stays corrupted until you manually clean it. Vboxwrapper 26208 includes a patch that avoids the race condition. CMS/Theory use a beta version that already includes that patch: https://github.com/BOINC/boinc/pull/5571 |
Send message Joined: 17 Oct 06 Posts: 89 Credit: 57,479,053 RAC: 9,771 ![]() ![]() ![]() |
... I'm running virtual box 7.1.4 ... Thanks for the explanation. I'll lock atlas at one task until it gets the patch. |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,618,700 RAC: 2,133 ![]() ![]() ![]() |
I was very hopefull that I finally could return a valid task with the HITS-file, because all 50 events out of 50 were processed and the Console showed processing HITS-file, but after returning the result no HITS-file was seen. https://lhcathome.cern.ch/lhcathome/result.php?resultid=418119537 From the result: 2024-12-12 17:13:47 (12592): Guest Log: *** Error codes and diagnostics *** 2024-12-12 17:13:47 (12592): Guest Log: "exeErrorCode": 68, 2024-12-12 17:13:47 (12592): Guest Log: "exeErrorDiag": "Fatal error in athena logfile: \"Long ERROR message at line 2945 (see jobReport for further details)\"", 2024-12-12 17:13:47 (12592): Guest Log: "pilotErrorCode": 1305, 2024-12-12 17:13:47 (12592): Guest Log: "pilotErrorDiag": "Failed to execute payload:PyJobTransforms.transform.execute 2024-12-12 16:11:09,915 CRITICAL Transform executor raised TransformLogfileErrorException: Fatal error in athena logfile: \"Long ERROR message at line 2945 (see jobReport for further details)\"", The next one https://lhcathome.cern.ch/lhcathome/result.php?resultid=418133093 no event processing at all. 2024-12-12 17:39:35 (8864): Guest Log: *** Error codes and diagnostics *** 2024-12-12 17:39:35 (8864): Guest Log: "exeErrorCode": 65, 2024-12-12 17:39:35 (8864): Guest Log: "exeErrorDiag": "Non-zero return code from EVNTtoHITS (1); Logfile error in log.EVNTtoHITS: \"GeoModelSvc FATAL in sysInitialize(): standard std::exception is caught\"", 2024-12-12 17:39:35 (8864): Guest Log: "pilotErrorCode": 1305, 2024-12-12 17:39:35 (8864): Guest Log: "pilotErrorDiag": "Failed to execute payload:PyJobTransforms.transform.execute 2024-12-12 16:36:58,835 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (1); Logfile error in log.EVNTtoHITS: \"GeoModelSvc FATAL in sysInitia", |
Send message Joined: 29 Dec 17 Posts: 1 Credit: 3,585,756 RAC: 2,019 ![]() ![]() ![]() |
CloverField wrote:Has anyone else been able to run more than one atlas task at a time? ... How do you clean VB media registry? |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,574,994 RAC: 119,363 ![]() ![]() |
How do you clean VB media registry? This has been explained a couple of times, e.g here for CMS: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6127&postid=49796 If your ATLAS vdi is affected, remove that entry from the list. |
![]() Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,574,994 RAC: 119,363 ![]() ![]() |
Just to ensure the errors are not caused by something unexpected. Did you recently check the health of - the disk (hardware) - the filesystem - the ATLAS vdi file |
©2025 CERN