Message boards :
ATLAS application :
ATLAS tasks fail after 10 min
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,476,070 RAC: 126,714 |
I'm getting a mix of valid and invalid ATLAS native tasks on different BOINC clients since 12:36 UTC today. All invalids fail after about 10 min runtime. Examples: https://lhcathome.cern.ch/lhcathome/result.php?resultid=270323978 https://lhcathome.cern.ch/lhcathome/result.php?resultid=270328004 https://lhcathome.cern.ch/lhcathome/result.php?resultid=270324483 |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
With CentOS in native and Windows and NO proxy, up to now for today no problems. Edit: RDP is not shown in Windows. |
Send message Joined: 26 Oct 18 Posts: 95 Credit: 4,188,598 RAC: 0 |
At least three of my computers running Windows and vbox tasks have suddenly run into validate errors today. Tasks failing in about 6 minutes. Examples: https://lhcathome.cern.ch/lhcathome/result.php?resultid=270325128 https://lhcathome.cern.ch/lhcathome/result.php?resultid=270340255 https://lhcathome.cern.ch/lhcathome/result.php?resultid=270358724 |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
have now also three in native at 18:45 UTC with a stop after 10 min. [2020-03-30 20:24:54] Starting ATLAS job with PandaID=4687202094 [2020-03-30 20:24:54] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2020-03-30 20:24:55] Job failed [2020-03-30 20:24:55] ++ pwd |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi, There was a bunch of bad tasks submitted yesterday by accident. They were submitted over a period of 2 hours yesterday afternoon and so hopefully they will be flushed out the system soon. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,476,070 RAC: 126,714 |
Thanks. So far all ATLAS tasks I got today are running fine. |
Send message Joined: 26 Oct 18 Posts: 95 Credit: 4,188,598 RAC: 0 |
I've still been getting some. One was sent in 1 Apr 2020, 2:04:13 UTC. Doesn't bother me. Only a small amount of time is wasted. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,476,070 RAC: 126,714 |
Hi, Most tasks from the faulty bunch were sent out for the first time 2 days ago. They will disappear when the #errors is high enough: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137171081 This afternoon I got a faulty task that was generated today. Is it part of the same old bunch or part of another faulty one? https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=137380248 |
Send message Joined: 2 Apr 12 Posts: 6 Credit: 334,298 RAC: 0 |
I too am experiencing ATLAS tasks failing after approximately 10 minutes. I just saw over a half dozen that did that. Looking at the task details I found this "Stderr output <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 20:53:25 (20061): wrapper (7.7.26015): starting 20:53:25 (20061): wrapper: running run_atlas (--nthreads 12) awk: line 2: function strftime never defined 21:03:26 (20061): run_atlas exited; CPU time 0.007836 21:03:26 (20061): app exit status: 0x1 21:03:26 (20061): called boinc_finish(195) </stderr_txt> ]]> " function strftime never defined error |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,476,070 RAC: 126,714 |
Searching for "strftime" at the top of this page will deliver the answer. |
Send message Joined: 2 Apr 12 Posts: 6 Credit: 334,298 RAC: 0 |
OK, did that... https://lhcathome.cern.ch/lhcathome/result.php?resultid=271425574 What is 'awk" and what is gawk and mawk? If I disable native in my preferences I too get near immediate computation errors... |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=274314793 [2020-05-22 20:26:23] *** Error codes and diagnostics *** [2020-05-22 20:26:23] "exeErrorCode": 0, [2020-05-22 20:26:23] "exeErrorDiag": "", [2020-05-22 20:26:23] "pilotErrorCode": 1346, [2020-05-22 20:26:23] "pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n", [2020-05-22 20:26:23] *** Listing of results directory *** [2020-05-22 20:26:23] insgesamt 252860 |
Send message Joined: 18 Dec 15 Posts: 1749 Credit: 115,410,210 RAC: 89,031 |
since yesterday evening, all tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time. Examples: https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285377 https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286741 https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286884 what's going wrong? |
Send message Joined: 27 Aug 15 Posts: 27 Credit: 11,599,378 RAC: 25,029 |
tasks are failing after about 14 minutes runtime and about 3-4 minutes CPU time. Examples: https://lhcathome.cern.ch/lhcathome/result.php?resultid=277284464 https://lhcathome.cern.ch/lhcathome/result.php?resultid=277285287 https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286589 https://lhcathome.cern.ch/lhcathome/result.php?resultid=277286071 |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
"pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n", I have the same problem with ATLAS native since yesterday. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 18 Dec 15 Posts: 1749 Credit: 115,410,210 RAC: 89,031 |
ah, okay, thanks for the information."pilotErrorDiag": "Transform not found:/bin/bash: Sim_tf.py: command not found\n", So there might be misconfigured tasks around, for VM as well as for native. I was wondering whether the Windows10 Update to 2004 which was made on this computer 2 days ago has to do with the problem (because before, ATLAS tasks did NOT fail). So this then probably is not the case. I now switched to CMS, and they run well. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I now switched to CMS, and they run well. They follow the Law of Conservation of Working Projects at CERN. When one thing starts working, another fails. Ivan saved us. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
Searching for "strftime" at the top of this page will deliver the answer.Hmm, was that answer retracted??? Sorry, couldn't find anything matching your search query. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,476,070 RAC: 126,714 |
Extend the default search period (30 days in the past) and you will get a couple of hits. Among them (posted 2020-04-12): https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5391&postid=42138 |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
Running Atlas-Tasks showing in the Server-Stats atm 5k, mostly 10k in the past. There must be something wrong with Boinc-Tasks from Cern. Isn't growing up: https://lhcathome.cern.ch/lhcathome/img/progresschart.png |
©2024 CERN