Message boards : Number crunching : Computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,381
RAC: 253
Message 25584 - Posted: 14 May 2013, 22:37:35 UTC

I am getting an unusually high number of computation errors with recent tasks. On all 3 machines that I use for SixTrack. The message on the task page is:

Stderr output

<core_client_version>7.0.65</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>

</stderr_txt>
]]>


I have no clue what that means. Any suggestions?
ID: 25584 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,587,045
RAC: 9,114
Message 25585 - Posted: 15 May 2013, 3:26:53 UTC

Hi jelle,

Are you running Linux? Since you are using 7.0.65 BOINC, I'm guessing that is the case.

On my Linux Ubuntu machines, there are a couple of things I know of that can cause a signal 11 error.

1. There were network problems (lost communications to the internet or other such issues) and BOINC gets a signal 11 error. Not much you can do about that one. Happened to me a couple of days ago and I had 3 jobs error out. If this is the cause, you should be able to look at the "Event Log" and see the network error messages.

2. When Linux gets busy doing something else and BOINC doesn't get any CPU cycles for a while, BOINC can get a signal 11 and error out the task. The recommended solution for that one is to go into your "Computing preferences" under "Your account". In the section for "Processor Usage" there is a parameter for "Suspend work when NON-BOINC CPU usage is above" and set that to 35%. That is the way my profile is set and I haven't had any of those errors in a while.

Hope that helps,
CaptainJack
ID: 25585 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,381
RAC: 253
Message 25586 - Posted: 16 May 2013, 1:06:29 UTC - in response to Message 25585.  

Thank you for your reaction. You have deduced correctely that I am running Linux. I use Xubuntu 12.04 and 12.10.

I am familiar with computation errors as a result of losing the network connection. That happens from time to time.

I was not familiar with the second reason you mention. Because of the sudden LHC workload I increased the number of CPUs that BOINC can use. With hyper-threading on Intel CPUs that means the percentage of logical CPUs for BOINC exceeded the physical CPUs (which I usually avoid). It sounds plausible that could increase the number of computation errors if BOINC is sensitive to that. I have now cranked down the CPU use percentage again, so let's see if the problem goes away.

Thanks for the advice.
ID: 25586 · Report as offensive     Reply Quote
Gary Charpentier

Send message
Joined: 3 Oct 07
Posts: 8
Credit: 2,892,546
RAC: 112
Message 25646 - Posted: 15 Jun 2013, 18:38:01 UTC

Is there a bug (32/64 int length) in the Mac version?

This machine is working fine on many other projects.

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGBUS: bus error

Crashed executable name: sixtrack_apple_gen
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.7.5 build 11G63
Thu Jun 13 11:50:51 2013

0   sixtrack_apple_gen                  0x0020df9e  PrintBacktrace (in sixtrack_apple_gen) + 1022

Thread 0 crashed with X86 Thread State (32-bit):
  eax: 0xffffffe1 ebx: 0x00000003 ecx: 0xbfffa61c edx: 0x9198ec22
  edi: 0xbfffa678 esi: 0x00000003 ebp: 0xbfffa648 esp: 0xbfffa61c
   ss: 0x00000023 efl: 0x00000206 eip: 0x9198ec22  cs: 0x0000000b
   ds: 0x00000023  es: 0x00000023  fs: 0x00000000  gs: 0x0000000f
ID: 25646 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25649 - Posted: 16 Jun 2013, 17:00:17 UTC - in response to Message 25646.  

Well no code is perfect; however SixTrack MacOS executable has passed the
18 standard tests. It is rather new though. The executable is
32-bit, so 64-bit shouldn't come into it really. maybe we should build with
-g and traceback options at least to have a better idea where the crash
occurred. If you could give me a pointer to WU I could rerun the case
here at CERN or on my own MAC. Eric.
ID: 25649 · Report as offensive     Reply Quote
Gary Charpentier

Send message
Joined: 3 Oct 07
Posts: 8
Credit: 2,892,546
RAC: 112
Message 25651 - Posted: 17 Jun 2013, 0:56:34 UTC - in response to Message 25649.  

Well no code is perfect; however SixTrack MacOS executable has passed the
18 standard tests. It is rather new though. The executable is
32-bit, so 64-bit shouldn't come into it really. maybe we should build with
-g and traceback options at least to have a better idea where the crash
occurred. If you could give me a pointer to WU I could rerun the case
here at CERN or on my own MAC. Eric.

Here are a couple of the 96 ...
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=7889834
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=7889768
ID: 25651 · Report as offensive     Reply Quote

Message boards : Number crunching : Computation errors


©2024 CERN