Message boards : ATLAS application : ATLAS tasks fail after 10 min
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2100
Credit: 163,101,254
RAC: 117,629
Message 42008 - Posted: 30 Mar 2020, 14:53:10 UTC

I'm getting a mix of valid and invalid ATLAS native tasks on different BOINC clients since 12:36 UTC today.
All invalids fail after about 10 min runtime.

Examples:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270323978
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270328004
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270324483
ID: 42008 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1669
Credit: 97,950,596
RAC: 305,784
Message 42009 - Posted: 30 Mar 2020, 17:31:00 UTC
Last modified: 30 Mar 2020, 17:46:58 UTC

With CentOS in native and Windows and NO proxy, up to now for today no problems.
Edit: RDP is not shown in Windows.
ID: 42009 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 85
Credit: 4,186,121
RAC: 0
Message 42010 - Posted: 30 Mar 2020, 18:01:56 UTC
Last modified: 30 Mar 2020, 18:02:53 UTC

At least three of my computers running Windows and vbox tasks have suddenly run into validate errors today. Tasks failing in about 6 minutes.

Examples:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270325128
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270340255
https://lhcathome.cern.ch/lhcathome/result.php?resultid=270358724
ID: 42010 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1669
Credit: 97,950,596
RAC: 305,784
Message 42011 - Posted: 30 Mar 2020, 20:40:25 UTC
Last modified: 30 Mar 2020, 20:49:39 UTC

have now also three in native at 18:45 UTC with a stop after 10 min.
[2020-03-30 20:24:54] Starting ATLAS job with PandaID=4687202094
[2020-03-30 20:24:54] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
[2020-03-30 20:24:55] Job failed
[2020-03-30 20:24:55] ++ pwd
ID: 42011 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 378
Credit: 14,405,403
RAC: 6,745
Message 42012 - Posted: 31 Mar 2020, 7:09:21 UTC - in response to Message 42011.  

Hi,

There was a bunch of bad tasks submitted yesterday by accident. They were submitted over a period of 2 hours yesterday afternoon and so hopefully they will be flushed out the system soon.
ID: 42012 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2100
Credit: 163,101,254
RAC: 117,629
Message 42014 - Posted: 31 Mar 2020, 10:04:33 UTC - in response to Message 42012.  

Thanks.
So far all ATLAS tasks I got today are running fine.
ID: 42014 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 85
Credit: 4,186,121
RAC: 0
Message 42018 - Posted: 1 Apr 2020, 2:33:45 UTC

I've still been getting some. One was sent in 1 Apr 2020, 2:04:13 UTC. Doesn't bother me. Only a small amount of time is wasted.
ID: 42018 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2100
Credit: 163,101,254
RAC: 117,629
Message 42019 - Posted: 1 Apr 2020, 13:43:14 UTC - in response to Message 42012.  

Hi,

There was a bunch of bad tasks submitted yesterday by accident. They were submitted over a period of 2 hours yesterday afternoon and so hopefully they will be flushed out the system soon.

Most tasks from the faulty bunch were sent out for the first time 2 days ago.
They will disappear when the #errors is high enough:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137171081


This afternoon I got a faulty task that was generated today.
Is it part of the same old bunch or part of another faulty one?
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137380248
ID: 42019 · Report as offensive     Reply Quote
Penguin

Send message
Joined: 2 Apr 12
Posts: 6
Credit: 334,298
RAC: 0
Message 42187 - Posted: 16 Apr 2020, 1:21:57 UTC

I too am experiencing ATLAS tasks failing after approximately 10 minutes. I just saw over a half dozen that did that.

Looking at the task details I found this

"Stderr output

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
20:53:25 (20061): wrapper (7.7.26015): starting
20:53:25 (20061): wrapper: running run_atlas (--nthreads 12)
awk: line 2: function strftime never defined
21:03:26 (20061): run_atlas exited; CPU time 0.007836
21:03:26 (20061): app exit status: 0x1
21:03:26 (20061): called boinc_finish(195)

</stderr_txt>
]]>
"



function strftime never defined error
ID: 42187 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2100
Credit: 163,101,254
RAC: 117,629
Message 42188 - Posted: 16 Apr 2020, 5:18:52 UTC - in response to Message 42187.  

Searching for "strftime" at the top of this page will deliver the answer.
ID: 42188 · Report as offensive     Reply Quote
Penguin

Send message
Joined: 2 Apr 12
Posts: 6
Credit: 334,298
RAC: 0
Message 42206 - Posted: 16 Apr 2020, 21:48:30 UTC - in response to Message 42188.  

OK, did that...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=271425574

awk: line 2: function strftime never defined


The scripts used to control LHC native tasks expect gawk to be used.
It looks like this computer is using mawk instead.
Solution: install gawk and make sure the awk link points to gawk




What is 'awk" and what is gawk and mawk?

If I disable native in my preferences I too get near immediate computation errors...
ID: 42206 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1669
Credit: 97,950,596
RAC: 305,784
Message 42575 - Posted: 22 May 2020, 19:10:19 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=274314793
[2020-05-22 20:26:23] *** Error codes and diagnostics ***
[2020-05-22 20:26:23] "exeErrorCode": 0,
[2020-05-22 20:26:23] "exeErrorDiag": "",
[2020-05-22 20:26:23] "pilotErrorCode": 1346,
[2020-05-22 20:26:23] "pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",
[2020-05-22 20:26:23] *** Listing of results directory ***
[2020-05-22 20:26:23] insgesamt 252860
ID: 42575 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1543
Credit: 52,425,987
RAC: 35,586
Message 42842 - Posted: 12 Jun 2020, 4:24:14 UTC

since yesterday evening, all tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time.
Examples:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285377
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286741
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286884

what's going wrong?
ID: 42842 · Report as offensive     Reply Quote
Klaus

Send message
Joined: 27 Aug 15
Posts: 27
Credit: 7,793,603
RAC: 3,721
Message 42843 - Posted: 12 Jun 2020, 7:14:45 UTC

tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time.

Examples:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=277284464
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285287
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286589
https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286071
ID: 42843 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 42844 - Posted: 12 Jun 2020, 7:55:55 UTC - in response to Message 42575.  

"pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",

I have the same problem with ATLAS native since yesterday.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 42844 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1543
Credit: 52,425,987
RAC: 35,586
Message 42845 - Posted: 12 Jun 2020, 11:12:59 UTC - in response to Message 42844.  
Last modified: 12 Jun 2020, 11:13:49 UTC

"pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n",

I have the same problem with ATLAS native since yesterday.
ah, okay, thanks for the information.
So there might be misconfigured tasks around, for VM as well as for native.
I was wondering whether the Windows10 Update to 2004 which was made on this computer 2 days ago has to do with the problem (because before, ATLAS tasks did NOT fail). So this then probably is not the case.

I now switched to CMS, and they run well.
ID: 42845 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 2,235
Message 42846 - Posted: 12 Jun 2020, 13:49:32 UTC - in response to Message 42845.  
Last modified: 12 Jun 2020, 13:49:45 UTC

I now switched to CMS, and they run well.

They follow the Law of Conservation of Working Projects at CERN.
When one thing starts working, another fails.
Ivan saved us.
ID: 42846 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 116
Credit: 39,222,833
RAC: 1,629
Message 42865 - Posted: 13 Jun 2020, 21:44:57 UTC - in response to Message 42188.  

Searching for "strftime" at the top of this page will deliver the answer.
Hmm, was that answer retracted???
Sorry, couldn't find anything matching your search query.
ID: 42865 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2100
Credit: 163,101,254
RAC: 117,629
Message 42869 - Posted: 14 Jun 2020, 7:05:55 UTC - in response to Message 42865.  

Extend the default search period (30 days in the past) and you will get a couple of hits.
Among them (posted 2020-04-12):
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5391&postid=42138
ID: 42869 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1669
Credit: 97,950,596
RAC: 305,784
Message 42870 - Posted: 14 Jun 2020, 9:30:28 UTC
Last modified: 14 Jun 2020, 9:32:03 UTC

Running Atlas-Tasks showing in the Server-Stats atm 5k, mostly 10k in the past.
There must be something wrong with Boinc-Tasks from Cern.
Isn't growing up:
https://lhcathome.cern.ch/lhcathome/img/progresschart.png
ID: 42870 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : ATLAS tasks fail after 10 min


©2022 CERN