Message boards :
Theory Application :
Theory Failure Ratio Explodes
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,491,123 RAC: 130,153 |
Overall Theory failure ratio raised to 100 % this morning: http://mcplots-dev.cern.ch/production.php?view=status&plots=hourly#plots |
Send message Joined: 26 Nov 10 Posts: 11 Credit: 1,435,923 RAC: 0 |
Hello! There was an update to the new version of scientific software yesterday. The update has issues and this is the reason of failures. I am looking now for the solution and will update once it is fixed. Thank you for the notice! |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 26,149 |
There was an update to the new version of scientific software yesterday. Can't you remove the defective WUs from the work server queue or must they all fail several times each? |
Send message Joined: 18 Dec 15 Posts: 1691 Credit: 104,607,242 RAC: 100,490 |
Obviously, the faulty tasks were removed.There was an update to the new version of scientific software yesterday. So it's the first time now, as far back as I can remember, that there are no tasks from any LHC-subproject available :-( |
Send message Joined: 18 Dec 15 Posts: 1691 Credit: 104,607,242 RAC: 100,490 |
new tasks were sent out, and again they were faulty. I am wondering why such a batch is not being testet before it is distributed ??? |
Send message Joined: 2 May 07 Posts: 2121 Credit: 159,926,969 RAC: 70,085 |
Erich56, this is the easy answer. Have you ever worked in IT? I have. |
Send message Joined: 27 Sep 08 Posts: 810 Credit: 654,497,622 RAC: 259,773 |
Its still bad for me |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,491,123 RAC: 130,153 |
Recently started a bunch of fresh ones. So far only 2 failed within 20 s with status code 1. All others either succeeded very quickly (status code 0) or are running fine. <edit> Meanwhile the failure rate increases again. Theory revision 2390 -> runs fine Theory revision 2638 -> fails </edit> |
Send message Joined: 28 Nov 08 Posts: 30 Credit: 14,859,718 RAC: 1,434 |
Hi, My Theory jobs are still failing (hostID=10834815). Different errors runRivet Setting environment... grep: /etc/redhat-release: No such file or directory ./runRivet.sh: line 33: /cvmfs/sft.cern.ch/lcg/releases/LCG_102b_ATLAS_28/../gcc/11.3.0/x86_64-slc6/setup.sh: No such file or directory ERROR: fail to set environment (gcc) or make: *** [yoda2flat-split.exe] Error 1 make: Leaving directory `/shared/rivetvm' ERROR: fail to compile yoda2flat-split or just hangs at Running job shoud appear here. [INFO] Container 'runc' finished with status code 1. When I shutdown the VM it reports job as finished (which is strange)... Is there any special procedure in place to recover from this? Best regards. |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,491,123 RAC: 130,153 |
Is there any special procedure in place to recover from this? You can't do anything. The errors are caused by deeper level scientific scripts. The developers are already aware and are working on a solution. When I shutdown the VM it reports job as finished (which is strange)... Not really strange since from BOINC's perspective (higher level) the tasks don't fail. Nonetheless it might be a good idea to stop requesting fresh work until the problem is solved since those very short runtimes will sooner or later confuse BOINC's work fetch algorithm (as well as it's credit calculation). |
Send message Joined: 2 May 07 Posts: 2121 Credit: 159,926,969 RAC: 70,085 |
Hi broz69, is it possible to make your PC's visible for us Volunteers? (prefs of LHCatHome). |
Send message Joined: 28 Nov 08 Posts: 30 Credit: 14,859,718 RAC: 1,434 |
They're visible now. |
Send message Joined: 28 Nov 08 Posts: 30 Credit: 14,859,718 RAC: 1,434 |
At the moment both my Windows computers are set to "No new tasks". Only Linux is running (native). Linux only has Theory_2390 jobs. |
Send message Joined: 2 May 07 Posts: 2121 Credit: 159,926,969 RAC: 70,085 |
Thank you to open it visible. For me using Virtualbox 7.0.6 with Boinc 7.24.1 from boinc.berkeley.edu. Don't know if 7.0.10 making problems. You can test it without squid, to see if there is a conflict. |
Send message Joined: 27 Sep 08 Posts: 810 Credit: 654,497,622 RAC: 259,773 |
Linux seems to be back working on my computers, windows is still rocky. As CM said, I doubt its anthing to to with Boinc or VirtualBox |
Send message Joined: 28 Nov 08 Posts: 30 Credit: 14,859,718 RAC: 1,434 |
Hi, It's what computezrmle wrote 2638 jobs fail and 2390 are OK, In the morning the jobs were not all 2390: Theory_2638 - 35 failed Theory_2637 - 26 failed Theory_2636 - 24 failed Theory_2390 - 1 OK In the evening I got some Theory jobs, all of them 2390. Theory_2390-1109174-576, Theory_2390-1100306-576, Theory_2390-1140982-576, Theory_2390-1099685-576 finished OK without proxy. But so did others with proxy. So it doesn't seem a problem with proxy, vbox or Boinc. So I'll just wait that people at LHC find the solution. Thanks. |
Send message Joined: 18 Dec 15 Posts: 1691 Credit: 104,607,242 RAC: 100,490 |
Theory revision 2390 -> runs finethe 2638 tasks still come in, once in a while. Would have been nice if they had been sorted out ... |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,491,123 RAC: 130,153 |
They will sort out automatically but it will take some time. Better this way than to cancel tasks in progress (see ATLAS). |
©2024 CERN