Message boards : ATLAS application : Atlas tasks have started to fail.
Message board moderation

To post messages, you must log in.

AuthorMessage
kotenok2000
Avatar

Send message
Joined: 21 Feb 11
Posts: 83
Credit: 577,613
RAC: 0
Message 51852 - Posted: 20 Apr 2025, 22:05:00 UTC
Last modified: 20 Apr 2025, 22:06:58 UTC

Atlas tasks have started to fail.
When i look at workunit data i see that they fail with other computers too.
https://lhcathome.cern.ch/lhcathome/results.php?userid=212324&offset=0&show_names=0&state=0&appid=14
ID: 51852 · Report as offensive     Reply Quote
cuphi

Send message
Joined: 17 Jun 21
Posts: 14
Credit: 3,426,865
RAC: 26
Message 51853 - Posted: 21 Apr 2025, 17:17:08 UTC - in response to Message 51852.  

Yes, I also had a small number of task start to fail a few days ago but the queue emptied today i have had over 100 fail.
ID: 51853 · Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 1 Sep 08
Posts: 6
Credit: 7,046,773
RAC: 6,512
Message 51866 - Posted: 28 Apr 2025, 22:43:00 UTC

Same here. I've noticed that they fail (for me) after 15 - 30 seconds. It seems to me like the WUs can't create a workspace (or whatever is the proper terminology) in VirtualBox.

Steve
ID: 51866 · Report as offensive     Reply Quote
Dave

Send message
Joined: 3 Aug 17
Posts: 13
Credit: 160,257
RAC: 8
Message 51912 - Posted: 26 May 2025, 8:31:14 UTC

Kernel 6.12.x loads a module that stops VB from starting. See this post for details.
ID: 51912 · Report as offensive     Reply Quote
Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 6
Credit: 138,278
RAC: 4,157
Message 51979 - Posted: 27 Jun 2025, 8:40:37 UTC
Last modified: 27 Jun 2025, 8:45:40 UTC

I'm getting a lot of invalid tasks and it seems the problem isn't on my end. Out of 20 tasks crunched today only 3 got validated (and all 3 had 1-2 invalid results before they got resent to me). Some of the invalid ones have already hit the error threshold of 4. As far as I can tell, CPU type/class doesn't matter. I see Xeons, general use i7s, EPYC monsters and old Atoms among the failed tasks.

Most failures are under 10 minutes (quad-core tasks running on an old i7-5600U), the valid ones take 20-30 mins.
ID: 51979 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 765
Credit: 56,837,206
RAC: 26,824
Message 51981 - Posted: 27 Jun 2025, 11:08:23 UTC - in response to Message 51979.  

Yep, all invalids seem to have this error:
2025-06-27 14:05:29 (171576): Guest Log:     "pilotErrorDiag": "Failed to execute payload:/bin/bash: Sim_tf.py: command not found",

ID: 51981 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas tasks have started to fail.


©2025 CERN