Message boards :
Number crunching :
Computation errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,704,381 RAC: 253 |
I am getting an unusually high number of computation errors with recent tasks. On all 3 machines that I use for SixTrack. The message on the task page is: Stderr output <core_client_version>7.0.65</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> </stderr_txt> ]]> I have no clue what that means. Any suggestions? |
Send message Joined: 21 Jun 10 Posts: 40 Credit: 10,587,045 RAC: 9,114 |
Hi jelle, Are you running Linux? Since you are using 7.0.65 BOINC, I'm guessing that is the case. On my Linux Ubuntu machines, there are a couple of things I know of that can cause a signal 11 error. 1. There were network problems (lost communications to the internet or other such issues) and BOINC gets a signal 11 error. Not much you can do about that one. Happened to me a couple of days ago and I had 3 jobs error out. If this is the cause, you should be able to look at the "Event Log" and see the network error messages. 2. When Linux gets busy doing something else and BOINC doesn't get any CPU cycles for a while, BOINC can get a signal 11 and error out the task. The recommended solution for that one is to go into your "Computing preferences" under "Your account". In the section for "Processor Usage" there is a parameter for "Suspend work when NON-BOINC CPU usage is above" and set that to 35%. That is the way my profile is set and I haven't had any of those errors in a while. Hope that helps, CaptainJack |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,704,381 RAC: 253 |
Thank you for your reaction. You have deduced correctely that I am running Linux. I use Xubuntu 12.04 and 12.10. I am familiar with computation errors as a result of losing the network connection. That happens from time to time. I was not familiar with the second reason you mention. Because of the sudden LHC workload I increased the number of CPUs that BOINC can use. With hyper-threading on Intel CPUs that means the percentage of logical CPUs for BOINC exceeded the physical CPUs (which I usually avoid). It sounds plausible that could increase the number of computation errors if BOINC is sensitive to that. I have now cranked down the CPU use percentage again, so let's see if the problem goes away. Thanks for the advice. |
Send message Joined: 3 Oct 07 Posts: 8 Credit: 2,892,546 RAC: 112 |
Is there a bug (32/64 int length) in the Mac version? This machine is working fine on many other projects. <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> SIGBUS: bus error Crashed executable name: sixtrack_apple_gen Machine type Intel 80486 (32-bit executable) System version: Macintosh OS 10.7.5 build 11G63 Thu Jun 13 11:50:51 2013 0 sixtrack_apple_gen 0x0020df9e PrintBacktrace (in sixtrack_apple_gen) + 1022 Thread 0 crashed with X86 Thread State (32-bit): eax: 0xffffffe1 ebx: 0x00000003 ecx: 0xbfffa61c edx: 0x9198ec22 edi: 0xbfffa678 esi: 0x00000003 ebp: 0xbfffa648 esp: 0xbfffa61c ss: 0x00000023 efl: 0x00000206 eip: 0x9198ec22 cs: 0x0000000b ds: 0x00000023 es: 0x00000023 fs: 0x00000000 gs: 0x0000000f |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well no code is perfect; however SixTrack MacOS executable has passed the 18 standard tests. It is rather new though. The executable is 32-bit, so 64-bit shouldn't come into it really. maybe we should build with -g and traceback options at least to have a better idea where the crash occurred. If you could give me a pointer to WU I could rerun the case here at CERN or on my own MAC. Eric. |
Send message Joined: 3 Oct 07 Posts: 8 Credit: 2,892,546 RAC: 112 |
Well no code is perfect; however SixTrack MacOS executable has passed the Here are a couple of the 96 ... http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=7889834 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=7889768 |
©2024 CERN