Message boards : Number crunching : Segmentation violation
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20492 - Posted: 22 Sep 2008, 19:15:57 UTC
Last modified: 22 Sep 2008, 19:17:22 UTC

At 19:04:51 UTC today Monday 22 I had a SIGSEV while running LHC@home that killed my BOINC client. I am running SuSE Linux 10.3, BOINC 5.10.45 on an AMD Opteron 1210 at 1.8 GHz, not overclocked. I am running SETI@home, Einstein@home, CPDN, CPDN beta, QMC@home and LHC@home.
Tullio
ID: 20492 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20546 - Posted: 26 Sep 2008, 19:15:44 UTC

Still another case of segmentation violation running LHC@home:
26-Sep-2008 21:06:02 [lhcathome] Scheduler request succeeded: got 0 new tasks
SIGSEGV: segmentation violation
Stack trace (8 frames):
./boinc[0x80938af]
[0xffffe420]
./boinc[0x80675e7]
./boinc[0x805d878]
./boinc[0x80831e2]
./boinc[0x80833d2]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7d49fe0]
./boinc(__gxx_personality_v0+0x165)[0x804bc21]

Exiting...
Cleaning up graphics data...
Detaching shared memory...
Cleaning up graphics data...
Detaching shared memory...Number

Times are UTC+2
Tullio
ID: 20546 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 20549 - Posted: 27 Sep 2008, 3:47:18 UTC

Would this be happening when the server cancels redundant work units? (Just speculating out loud)




I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 20549 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20550 - Posted: 27 Sep 2008, 6:09:13 UTC - in response to Message 20549.  

Would this be happening when the server cancels redundant work units? (Just speculating out loud)




Yes. it happens after the client tries to connect to the server. I\'ve suspended LHC@home until I get an answer. I have 5 other projects running with tight deadlines and I cannot afford a boinc client stopping when I am not present.
Tullio
ID: 20550 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20553 - Posted: 27 Sep 2008, 14:53:56 UTC
Last modified: 27 Sep 2008, 14:57:31 UTC

I installed from BOINC but I am running also SETI, Einstein, QMC, CPDN, CPDN Beta and I never had any problem. On SETI I am running also an optimized application and Astropulse and never had a compute error. Only LHC@home gives me some problems.
Tullio
ID: 20553 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20555 - Posted: 27 Sep 2008, 17:44:17 UTC
Last modified: 27 Sep 2008, 18:33:56 UTC

The BOINC boinc is not static executable and uses the following libraries
boinc:
linux-gate.so.1 => (0xffffe000)
libnsl.so.1 => /lib/libnsl.so.1 (0xb7f24000)
libdl.so.2 => /lib/libdl.so.2 (0xb7f20000)
libz.so.1 => /lib/libz.so.1 (0xb7f0d000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7e1f000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7e08000)
libm.so.6 => /lib/libm.so.6 (0xb7de3000)
libc.so.6 => /lib/libc.so.6 (0xb7caf000)
/lib/ld-linux.so.2 (0xb7f54000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ca3000)
boinc_cmd:
linux-gate.so.1 => (0xffffe000)
libnsl.so.1 => /lib/libnsl.so.1 (0xb7ec0000)
libdl.so.2 => /lib/libdl.so.2 (0xb7ebc000)
libz.so.1 => /lib/libz.so.1 (0xb7ea9000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7e92000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7da4000)
libm.so.6 => /lib/libm.so.6 (0xb7d7f000)
libc.so.6 => /lib/libc.so.6 (0xb7c4b000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7c3f000)
/lib/ld-linux.so.2 (0xb7ef0000)

I\'ve searched the OpenSuse site for a BOINC installer but could not find it. Cheers.
Tullio
ID: 20555 · Report as offensive     Reply Quote
Toby

Send message
Joined: 1 Sep 04
Posts: 137
Credit: 1,691,526
RAC: 0
Message 20575 - Posted: 1 Oct 2008, 4:58:37 UTC
Last modified: 1 Oct 2008, 4:59:04 UTC

I have seen this as well. I didn't really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won't swear to that. I've seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn't help. But it is pretty infrequent which of course means hard to troubleshoot...
- A member of The Knights Who Say NI!
My BOINC stats site
ID: 20575 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20578 - Posted: 1 Oct 2008, 9:14:52 UTC - in response to Message 20575.  
Last modified: 1 Oct 2008, 9:16:00 UTC

I have seen this as well. I didn\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\'t swear to that. I\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\'t help. But it is pretty infrequent which of course means hard to troubleshoot...

It happened twice on me and always on LHC and never on the other 5 projects I am running. But I am sure it happens when the server wants to delete a redundant result.
Tullio
ID: 20578 · Report as offensive     Reply Quote
AdeB
Avatar

Send message
Joined: 9 Dec 06
Posts: 9
Credit: 2,413,908
RAC: 0
Message 20580 - Posted: 1 Oct 2008, 11:12:36 UTC - in response to Message 20578.  

I have seen this as well. I didn\\\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\\\'t swear to that. I\\\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\\\'t help. But it is pretty infrequent which of course means hard to troubleshoot...

It happened twice on me and always on LHC and never on the other 5 projects I am running. But I am sure it happens when the server wants to delete a redundant result.
Tullio

Looks like yesterday it happened to me as well.
AdeB
ID: 20580 · Report as offensive     Reply Quote
Keith T.
Avatar

Send message
Joined: 1 Mar 07
Posts: 47
Credit: 32,356
RAC: 0
Message 20581 - Posted: 1 Oct 2008, 12:01:01 UTC - in response to Message 20578.  

I have seen this as well. I didn\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\'t swear to that. I\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\'t help. But it is pretty infrequent which of course means hard to troubleshoot...

It happened twice on me and always on LHC and never on the other 5 projects I am running. But I am sure it happens when the server wants to delete a redundant result.
Tullio


I don\'t know Linux well, but it seems to me that this could be a BOINC issue which only occurs under certain circumstances rather than a LHC@home issue.

Has this occured on any other project which cancels redundant results ?

I know that SETI Beta runs a IR=3 Q=2 policy on MultiBeam (not AstroPulse) WU\'s, so it may be worth while attaching to that project as a test, to see if this occurs on other BOINC projects.

It may also be useful to post about this on BOINC message boards, or raise a TRAC Ticket.

Keith.
ID: 20581 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20582 - Posted: 1 Oct 2008, 13:19:16 UTC - in response to Message 20581.  
Last modified: 1 Oct 2008, 13:21:27 UTC

I have seen this as well. I didn\\\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\\\'t swear to that. I\\\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\\\'t help. But it is pretty infrequent which of course means hard to troubleshoot...

It happened twice on me and always on LHC and never on the other 5 projects I am running. But I am sure it happens when the server wants to delete a redundant result.
Tullio


I don\\\'t know Linux well, but it seems to me that this could be a BOINC issue which only occurs under certain circumstances rather than a LHC@home issue.

Has this occured on any other project which cancels redundant results ?

I know that SETI Beta runs a IR=3 Q=2 policy on MultiBeam (not AstroPulse) WU\\\'s, so it may be worth while attaching to that project as a test, to see if this occurs on other BOINC projects.

It may also be useful to post about this on BOINC message boards, or raise a TRAC Ticket.

Keith.

It seems this happens also on other projects when the client asks for works and receives none. See this on the BOINC message boards:
BOINC client exits
Tullio
ID: 20582 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20588 - Posted: 1 Oct 2008, 17:57:09 UTC - in response to Message 20587.  
Last modified: 1 Oct 2008, 18:19:07 UTC

It seems this happens also on other projects when the client asks for works and receives none. See this on the BOINC message boards:
BOINC client exits
Tullio


From the trials Jean-David reports in that thread it really looks like Hydrogen@home was causing the SEGV. It seems he got the SEGV only (mostly?) when he requested work from Hydrogen but didn\\\'t get any.

Now, one thing that is peculiar about Hydrogen is that when that server has no work, the client log shows...
 
[Hydrogen@Home] Message from server: No work sent


...in addition to the standard
24-Aug-2008 06:52:57 Scheduler request succeeded: got 0 new tasks


Is the \\\"Message from server: no work sent\\\" logged only when one has debug/logging options set in cc_config.xml?

I don\\\'t get that message from any of the projects I am attached to when they have no work. I am wondering if that message somehow triggers the SEGV and does LHC send that message too?


Can't recall. Will try to connect.
Tullio
No, it is not sending that message, only this:
Scheduler request succeeded, got 0 new tasks
ID: 20588 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 20592 - Posted: 1 Oct 2008, 23:17:47 UTC - in response to Message 20591.  

Maybe some clues will turn up.


Col. Mustard in the Ballroom with the Candlestick... ;-)



In all seriousness, has anyone considered falling back to a 5.8.x version of BOINC and seeing if the problem is still there or not?
ID: 20592 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 20593 - Posted: 1 Oct 2008, 23:25:23 UTC

I restarted lhc@home and shall report. I am still thinking that the problem arises when the server deletes a redundant result.
Tullio
ID: 20593 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Segmentation violation


©2024 CERN