Thread 'Computation error on WU'

Author	Message
Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 65,875,712 RAC: 27,749	Message 3986 - Posted: 19 Oct 2004, 7:38:12 UTC Hi, here's an extract from my stderr.txt: 2004-10-18 14:47:22 [LHC@home] Unrecoverable error for result v64lhc1000pronine-41s12_14526.73_1_sixvf_2295_1 (There are no child processes to wait for. (0x80) - exit code 128 (0x80)) Does anybody have idea what this means? The result uploaded to server OK though. The WU calculation took 1h 15 min as they usually take 1h 30 min. Using Boinc 4.13, set up to Seti 66.6% and LHC 33.3%. ID: 3986 · Reply Quote

Toby Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,848,397 RAC: 23	Message 3989 - Posted: 19 Oct 2004, 8:09:03 UTC Your computers are hidden so I can't see what OS you are running. Is it by any chance linux? I stopped running LHC on my P3-500 running knoppix because 90% of the work units errored out like that when they were just about to complete. I'm wondering if there is a problem with the linux client. The same setup is doing seti@home work units without incident. On the other hand, my other linux box which is running gentoo does not seem to be having the same problems so I'm not sure what to think. Maybe something about knoppix gives bad mojo? -------------------------------------- A member of The Knights Who Say Ni! My BOINC stats site ID: 3989 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 65,875,712 RAC: 27,749	Message 3991 - Posted: 19 Oct 2004, 8:49:57 UTC - in response to Message 3989. Now my computer should be visible. It is running Windows 2000 SP4. I don't have any Linux systems available. ID: 3991 · Reply Quote

Toby Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,848,397 RAC: 23	Message 3992 - Posted: 19 Oct 2004, 8:58:18 UTC Last modified: 19 Oct 2004, 8:59:08 UTC Darn! I was hoping for a trend. oh well :) Guess my exit codes are different than yours as well. I think I posted about this in another thread already actually... Ah yes, here it is. 2004-10-15 01:50:45 [LHC@home] Unrecoverable error for result v64lhc1000profour17s12_14518.18_1_sixvf_8194_9 (process exited with code 12 (0xc)) Most of the ones I see right now are code 12. a few are code 240 (0xf0) Maybe the admins could post a list of error codes and what they mean? could help to track down the problem and see if it is a configuration/library/hardware problem on the client machine or maybe a bug in the software. Anyone? :) -------------------------------------- A member of The Knights Who Say Ni! My BOINC stats site ID: 3992 · Reply Quote

grumpy Send message Joined: 1 Sep 04 Posts: 57 Credit: 2,835,005 RAC: 0	Message 4616 - Posted: 29 Oct 2004, 16:24:41 UTC LHC@home - 2004-10-29 11:12:58 - Restarting result v64lhc1000proeleven-55s14_16518.45_1_sixvf_27181_1 using sixtrack version 4.47 LHC@home - 2004-10-29 12:11:50 - Unrecoverable error for result v64lhc1000proeleven-55s14_16518.45_1_sixvf_27181_1 ( - exit code -1 (0xffffffff)) LHC@home - 2004-10-29 12:11:50 - Computation for result v64lhc1000proeleven-55s14_16518.45_1_sixvf_27181 finished That was a popup windows error stating invalid_page_fault for sixtrack :win 98 ID: 4616 · Reply Quote

grumpy Send message Joined: 1 Sep 04 Posts: 57 Credit: 2,835,005 RAC: 0	Message 4657 - Posted: 30 Oct 2004, 16:18:41 UTC sIXTRACK_4 caused a stack fault in module SIXTRACK_4.47_WINDOWS_INTELX86.EXE at 0177:00525f4f. Registers: EAX=00000008 CS=0177 EIP=00525f4f EFLGS=00010202 EBX=00000000 SS=017f ESP=04452000 EBP=0445201c ECX=0065cba8 DS=017f ESI=0065d198 FS=38ff EDX=01c4be98 ES=017f EDI=00000000 GS=0000 Bytes at CS:EIP: 56 57 8b 45 f8 89 65 e8 50 8b 45 fc c7 45 fc ff Stack dump: 00000000 00000000 00000000 04452050 00525f88 00527f86 005f8808 0445202c 00528033 00000004 00000018 04452060 005256fd 00000004 00000000 0065d198 ID: 4657 · Reply Quote

Richard Cox Send message Joined: 23 Oct 04 Posts: 7 Credit: 58,953 RAC: 0	Message 4666 - Posted: 30 Oct 2004, 18:24:13 UTC - in response to Message 3986. Last modified: 30 Oct 2004, 18:28:19 UTC > Hi, > here's an extract from my stderr.txt: > > 2004-10-18 14:47:22 [LHC@home] Unrecoverable error for result > v64lhc1000pronine-41s12_14526.73_1_sixvf_2295_1 (There are no child processes > to wait for. (0x80) - exit code 128 (0x80)) > > Does anybody have idea what this means? > > The result uploaded to server OK though. The WU calculation took 1h 15 min as > they usually take 1h 30 min. Using Boinc 4.13, set up to Seti 66.6% and LHC > 33.3%. > > Harri, the error codes are put out by the OS although generated by the application for a variety of reasons from hardware failure to software bugs. You might find lists on the Microsoft web site; they come in many flavors. Since is was near the end of the calculation, my guess is that is was some IO error. You seem to have plenty of RAM; what mobo are you using and what is the chipset? ID: 4666 · Reply Quote

Richard Cox Send message Joined: 23 Oct 04 Posts: 7 Credit: 58,953 RAC: 0	Message 4667 - Posted: 30 Oct 2004, 18:29:42 UTC - in response to Message 3992. Last modified: 30 Oct 2004, 18:30:37 UTC duplicate message deleted. ID: 4667 · Reply Quote

Richard Cox Send message Joined: 23 Oct 04 Posts: 7 Credit: 58,953 RAC: 0	Message 4668 - Posted: 30 Oct 2004, 18:29:43 UTC - in response to Message 3992. Last modified: 30 Oct 2004, 18:32:28 UTC > Darn! I was hoping for a trend. oh well :) Guess my exit codes are > different than yours as well. I think I posted about this in another thread > already actually... Ah yes, here it is. > > 2004-10-15 01:50:45 [LHC@home] Unrecoverable error for result > v64lhc1000profour17s12_14518.18_1_sixvf_8194_9 (process exited with code 12 > (0xc)) > > Most of the ones I see right now are code 12. a few are code 240 (0xf0) > > Maybe the admins could post a list of error codes and what they mean? could > help to track down the problem and see if it is a > configuration/library/hardware problem on the client machine or maybe a bug in > the software. Anyone? :) > > -------------------------------------- > A member of The > Knights Who Say Ni! > My BOINC stats site > Tobi, which of your five computers got this error? There may be a trend if it was the Pentium running Linux; the error may have been generated in the math processor of the chip or some IO glitch that it didn't cope with. ID: 4668 · Reply Quote

Toby Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,848,397 RAC: 23	Message 4669 - Posted: 30 Oct 2004, 18:54:55 UTC It was the Pentium 3 linux box running knoppix. I had the boinc directory mounted via an SMB share to my windows box so as not to lose work if I had to reboot the knoppix box. (it has no hard drive) I believe I was seeing the same bugs that exist with NFS. See here for details. I decided to just take LHC off of that one for the time being and put it on seti which runs without any problems over a network mount. -------------------------------------- A member of The Knights Who Say Ni! My BOINC stats site ID: 4669 · Reply Quote

FalconFly Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0	Message 4676 - Posted: 31 Oct 2004, 0:28:30 UTC - in response to Message 4669. Ever since the V4.13 BOINC, I'm seeing an increased number of various "Computing Errors" on random Win32 and Linux boxes (SETI and LHC, but luckily not with CPDN). Seems there are some Bugs needed to be fixed. ___________________________________________ <p>Scientific Network : 36200 MHz «» 8204 MB «» 815.0 GB </p> ID: 4676 · Reply Quote

Jason Send message Joined: 18 Sep 04 Posts: 7 Credit: 13,292 RAC: 0	Message 4700 - Posted: 31 Oct 2004, 12:45:27 UTC - in response to Message 4669. I'm running BOINC 4.13 GenuineIntel 997MHz Pentium Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00) 4 of my last 10 WUs came back with a computation error similar to this: Unrecoverable error for result v64lhc1000prosix-27s10_12551.21_1_sixvf_30583_3 (exit code -1073741819(0xc0000005) 2 of my last 8 SETI workunits generated similar errors. I'm also running the classic version of Climate prediction, and I'm still attached to Predictor, even though they've been out of order for over a month now. Am I doing more harm than good staying on this project? Should I detach? Thanks, Jason > It was the Pentium 3 linux box running knoppix. I had the boinc directory > mounted via an SMB share to my windows box so as not to lose work if I had to > reboot the knoppix box. (it has no hard drive) I believe I was seeing the > same bugs that exist with NFS. See <a> href="http://lhcathome.cern.ch/known_bugs.html">here[/url] for details. I > decided to just take LHC off of that one for the time being and put it on seti > which runs without any problems over a network mount. > > -------------------------------------- > A member of The > Knights Who Say Ni! > My BOINC stats site > ID: 4700 · Reply Quote

grumpy Send message Joined: 1 Sep 04 Posts: 57 Credit: 2,835,005 RAC: 0	Message 4952 - Posted: 6 Nov 2004, 15:41:04 UTC I am investigating this problem on my win 98 machine. I think I may found the problem: regionnal settings, clock, date formats, etc. ( so far so good). ID: 4952 · Reply Quote