Message boards : ATLAS application : ATLAS tasks fail after 10 min
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2724
Credit: 299,005,405
RAC: 3,702
Message 42008 - Posted: 30 Mar 2020, 14:53:10 UTC

I'm getting a mix of valid and invalid ATLAS native tasks on different BOINC clients since 12:36 UTC today.
All invalids fail after about 10 min runtime.

Examples:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270323978
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270328004
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270324483
ID: 42008 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2285
Credit: 178,823,324
RAC: 1,040
Message 42009 - Posted: 30 Mar 2020, 17:31:00 UTC
Last modified: 30 Mar 2020, 17:46:58 UTC

With CentOS in native and Windows and NO proxy, up to now for today no problems.
Edit: RDP is not shown in Windows.
ID: 42009 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 111
Credit: 5,308,404
RAC: 0
Message 42010 - Posted: 30 Mar 2020, 18:01:56 UTC
Last modified: 30 Mar 2020, 18:02:53 UTC

At least three of my computers running Windows and vbox tasks have suddenly run into validate errors today. Tasks failing in about 6 minutes.

Examples:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270325128
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270340255
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270358724
ID: 42010 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2285
Credit: 178,823,324
RAC: 1,040
Message 42011 - Posted: 30 Mar 2020, 20:40:25 UTC
Last modified: 30 Mar 2020, 20:49:39 UTC

have now also three in native at 18:45 UTC with a stop after 10 min.
[2020-03-30 20:24:54] Starting ATLAS job with PandaID=4687202094
[2020-03-30 20:24:54] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
[2020-03-30 20:24:55] Job failed
[2020-03-30 20:24:55] ++ pwd
ID: 42011 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 42012 - Posted: 31 Mar 2020, 7:09:21 UTC - in response to Message 42011.  

Hi,

There was a bunch of bad tasks submitted yesterday by accident. They were submitted over a period of 2 hours yesterday afternoon and so hopefully they will be flushed out the system soon.
ID: 42012 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2724
Credit: 299,005,405
RAC: 3,702
Message 42014 - Posted: 31 Mar 2020, 10:04:33 UTC - in response to Message 42012.  

Thanks.
So far all ATLAS tasks I got today are running fine.
ID: 42014 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 111
Credit: 5,308,404
RAC: 0
Message 42018 - Posted: 1 Apr 2020, 2:33:45 UTC

I've still been getting some. One was sent in 1 Apr 2020, 2:04:13 UTC. Doesn't bother me. Only a small amount of time is wasted.
ID: 42018 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2724
Credit: 299,005,405
RAC: 3,702
Message 42019 - Posted: 1 Apr 2020, 13:43:14 UTC - in response to Message 42012.  

Hi,

There was a bunch of bad tasks submitted yesterday by accident. They were submitted over a period of 2 hours yesterday afternoon and so hopefully they will be flushed out the system soon.

Most tasks from the faulty bunch were sent out for the first time 2 days ago.
They will disappear when the #errors is high enough:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137171081


This afternoon I got a faulty task that was generated today.
Is it part of the same old bunch or part of another faulty one?
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137380248
ID: 42019 · Report as offensive     Reply Quote
Penguin

Send message
Joined: 2 Apr 12
Posts: 6
Credit: 334,298
RAC: 0
Message 42187 - Posted: 16 Apr 2020, 1:21:57 UTC

I too am experiencing ATLAS tasks failing after approximately 10 minutes. I just saw over a half dozen that did that.

Looking at the task details I found this

"Stderr output

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
20:53:25 (20061): wrapper (7.7.26015): starting
20:53:25 (20061): wrapper: running run_atlas (--nthreads 12)
awk: line 2: function strftime never defined
21:03:26 (20061): run_atlas exited; CPU time 0.007836
21:03:26 (20061): app exit status: 0x1
21:03:26 (20061): called boinc_finish(195)

</stderr_txt>
]]>
"



function strftime never defined error
ID: 42187 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2724
Credit: 299,005,405
RAC: 3,702
Message 42188 - Posted: 16 Apr 2020, 5:18:52 UTC - in response to Message 42187.  

Searching for "strftime" at the top of this page will deliver the answer.
ID: 42188 · Report as offensive     Reply Quote
Penguin

Send message
Joined: 2 Apr 12
Posts: 6
Credit: 334,298
RAC: 0
Message 42206 - Posted: 16 Apr 2020, 21:48:30 UTC - in response to Message 42188.  

OK, did that...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=271425574

awk: line 2: function strftime never defined


The scripts used to control LHC native tasks expect gawk to be used.
It looks like this computer is using mawk instead.
Solution: install gawk and make sure the awk link points to gawk




What is 'awk" and what is gawk and mawk?

If I disable native in my preferences I too get near immediate computation errors...
ID: 42206 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2285
Credit: 178,823,324
RAC: 1,040
Message 42575 - Posted: 22 May 2020, 19:10:19 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=274314793
[2020-05-22 20:26:23] *** Error codes and diagnostics ***
[2020-05-22 20:26:23] "exeErrorCode": 0,
[2020-05-22 20:26:23] "exeErrorDiag": "",
[2020-05-22 20:26:23] "pilotErrorCode": 1346,
[2020-05-22 20:26:23] "pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",
[2020-05-22 20:26:23] *** Listing of results directory ***
[2020-05-22 20:26:23] insgesamt 252860
ID: 42575 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1956
Credit: 158,654,011
RAC: 59,982
Message 42842 - Posted: 12 Jun 2020, 4:24:14 UTC

since yesterday evening, all tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time.
Examples:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285377
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286741
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286884

what's going wrong?
ID: 42842 · Report as offensive     Reply Quote
Klaus

Send message
Joined: 27 Aug 15
Posts: 28
Credit: 29,189,592
RAC: 39,541
Message 42843 - Posted: 12 Jun 2020, 7:14:45 UTC

tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time.

Examples:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=277284464
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285287
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286589
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286071
ID: 42843 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 42844 - Posted: 12 Jun 2020, 7:55:55 UTC - in response to Message 42575.  

"pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",

I have the same problem with ATLAS native since yesterday.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 42844 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1956
Credit: 158,654,011
RAC: 59,982
Message 42845 - Posted: 12 Jun 2020, 11:12:59 UTC - in response to Message 42844.  
Last modified: 12 Jun 2020, 11:13:49 UTC

"pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",

I have the same problem with ATLAS native since yesterday.
ah, okay, thanks for the information.
So there might be misconfigured tasks around, for VM as well as for native.
I was wondering whether the Windows10 Update to 2004 which was made on this computer 2 days ago has to do with the problem (because before, ATLAS tasks did NOT fail). So this then probably is not the case.

I now switched to CMS, and they run well.
ID: 42845 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 42846 - Posted: 12 Jun 2020, 13:49:32 UTC - in response to Message 42845.  
Last modified: 12 Jun 2020, 13:49:45 UTC

I now switched to CMS, and they run well.

They follow the Law of Conservation of Working Projects at CERN.
When one thing starts working, another fails.
Ivan saved us.
ID: 42846 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 140
Credit: 57,420,502
RAC: 6,017
Message 42865 - Posted: 13 Jun 2020, 21:44:57 UTC - in response to Message 42188.  

]Searching for "strftime" at the top of this page will deliver the answer.[/quote]Hmm, was that answer retracted???[pre]Sorry, couldn't find anything matching your search query.[/pre]
ID: 42865 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2724
Credit: 299,005,405
RAC: 3,702
Message 42869 - Posted: 14 Jun 2020, 7:05:55 UTC - in response to Message 42865.  

Extend the default search period (30 days in the past) and you will get a couple of hits.
Among them (posted 2020-04-12):
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5391&postid=42138
ID: 42869 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2285
Credit: 178,823,324
RAC: 1,040
Message 42870 - Posted: 14 Jun 2020, 9:30:28 UTC
Last modified: 14 Jun 2020, 9:32:03 UTC

Running Atlas-Tasks showing in the Server-Stats atm 5k, mostly 10k in the past.
There must be something wrong with Boinc-Tasks from Cern.
Isn't growing up:
https://lhcathome.cern.ch/lhcathome/img/progresschart.png
ID: 42870 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : ATLAS tasks fail after 10 min


©2026 CERN