Message boards : ATLAS application : Atlas tasks have started to fail.
Message board moderation

To post messages, you must log in.

AuthorMessage
kotenok2000
Avatar

Send message
Joined: 21 Feb 11
Posts: 86
Credit: 578,973
RAC: 0
Message 51852 - Posted: 20 Apr 2025, 22:05:00 UTC
Last modified: 20 Apr 2025, 22:06:58 UTC

Atlas tasks have started to fail.
When i look at workunit data i see that they fail with other computers too.
https://lhcathome.cern.ch/lhcathome/results.php?userid=212324&offset=0&show_names=0&state=0&appid=14
ID: 51852 · Report as offensive     Reply Quote
cuphi

Send message
Joined: 17 Jun 21
Posts: 14
Credit: 3,426,865
RAC: 0
Message 51853 - Posted: 21 Apr 2025, 17:17:08 UTC - in response to Message 51852.  

Yes, I also had a small number of task start to fail a few days ago but the queue emptied today i have had over 100 fail.
ID: 51853 · Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 1 Sep 08
Posts: 6
Credit: 7,789,228
RAC: 16,406
Message 51866 - Posted: 28 Apr 2025, 22:43:00 UTC

Same here. I've noticed that they fail (for me) after 15 - 30 seconds. It seems to me like the WUs can't create a workspace (or whatever is the proper terminology) in VirtualBox.

Steve
ID: 51866 · Report as offensive     Reply Quote
Dave

Send message
Joined: 3 Aug 17
Posts: 13
Credit: 160,257
RAC: 0
Message 51912 - Posted: 26 May 2025, 8:31:14 UTC

Kernel 6.12.x loads a module that stops VB from starting. See this post for details.
ID: 51912 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 51
Credit: 907,191
RAC: 23,035
Message 51979 - Posted: 27 Jun 2025, 8:40:37 UTC
Last modified: 27 Jun 2025, 8:45:40 UTC

I'm getting a lot of invalid tasks and it seems the problem isn't on my end. Out of 20 tasks crunched today only 3 got validated (and all 3 had 1-2 invalid results before they got resent to me). Some of the invalid ones have already hit the error threshold of 4. As far as I can tell, CPU type/class doesn't matter. I see Xeons, general use i7s, EPYC monsters and old Atoms among the failed tasks.

Most failures are under 10 minutes (quad-core tasks running on an old i7-5600U), the valid ones take 20-30 mins.
ID: 51979 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 780
Credit: 59,707,337
RAC: 45,643
Message 51981 - Posted: 27 Jun 2025, 11:08:23 UTC - in response to Message 51979.  

Yep, all invalids seem to have this error:
2025-06-27 14:05:29 (171576): Guest Log:     "pilotErrorDiag": "Failed to execute payload:/bin/bash: Sim_tf.py: command not found",

ID: 51981 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas tasks have started to fail.


©2025 CERN