Message boards :
Number crunching :
Segmentation violation
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
At 19:04:51 UTC today Monday 22 I had a SIGSEV while running LHC@home that killed my BOINC client. I am running SuSE Linux 10.3, BOINC 5.10.45 on an AMD Opteron 1210 at 1.8 GHz, not overclocked. I am running SETI@home, Einstein@home, CPDN, CPDN beta, QMC@home and LHC@home. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Still another case of segmentation violation running LHC@home: 26-Sep-2008 21:06:02 [lhcathome] Scheduler request succeeded: got 0 new tasks SIGSEGV: segmentation violation Stack trace (8 frames): ./boinc[0x80938af] [0xffffe420] ./boinc[0x80675e7] ./boinc[0x805d878] ./boinc[0x80831e2] ./boinc[0x80833d2] /lib/libc.so.6(__libc_start_main+0xe0)[0xb7d49fe0] ./boinc(__gxx_personality_v0+0x165)[0x804bc21] Exiting... Cleaning up graphics data... Detaching shared memory... Cleaning up graphics data... Detaching shared memory...Number Times are UTC+2 Tullio |
Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
Would this be happening when the server cancels redundant work units? (Just speculating out loud) I'm not the LHC Alex. Just a number cruncher like everyone else here. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Would this be happening when the server cancels redundant work units? (Just speculating out loud) Yes. it happens after the client tries to connect to the server. I\'ve suspended LHC@home until I get an answer. I have 5 other projects running with tight deadlines and I cannot afford a boinc client stopping when I am not present. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I installed from BOINC but I am running also SETI, Einstein, QMC, CPDN, CPDN Beta and I never had any problem. On SETI I am running also an optimized application and Astropulse and never had a compute error. Only LHC@home gives me some problems. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
The BOINC boinc is not static executable and uses the following libraries boinc: linux-gate.so.1 => (0xffffe000) libnsl.so.1 => /lib/libnsl.so.1 (0xb7f24000) libdl.so.2 => /lib/libdl.so.2 (0xb7f20000) libz.so.1 => /lib/libz.so.1 (0xb7f0d000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7e1f000) libpthread.so.0 => /lib/libpthread.so.0 (0xb7e08000) libm.so.6 => /lib/libm.so.6 (0xb7de3000) libc.so.6 => /lib/libc.so.6 (0xb7caf000) /lib/ld-linux.so.2 (0xb7f54000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ca3000) boinc_cmd: linux-gate.so.1 => (0xffffe000) libnsl.so.1 => /lib/libnsl.so.1 (0xb7ec0000) libdl.so.2 => /lib/libdl.so.2 (0xb7ebc000) libz.so.1 => /lib/libz.so.1 (0xb7ea9000) libpthread.so.0 => /lib/libpthread.so.0 (0xb7e92000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7da4000) libm.so.6 => /lib/libm.so.6 (0xb7d7f000) libc.so.6 => /lib/libc.so.6 (0xb7c4b000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7c3f000) /lib/ld-linux.so.2 (0xb7ef0000) I\'ve searched the OpenSuse site for a BOINC installer but could not find it. Cheers. Tullio |
Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,733,409 RAC: 618 |
I have seen this as well. I didn't really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won't swear to that. I've seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn't help. But it is pretty infrequent which of course means hard to troubleshoot... - A member of The Knights Who Say NI! My BOINC stats site |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have seen this as well. I didn\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\'t swear to that. I\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\'t help. But it is pretty infrequent which of course means hard to troubleshoot... It happened twice on me and always on LHC and never on the other 5 projects I am running. But I am sure it happens when the server wants to delete a redundant result. Tullio |
Send message Joined: 9 Dec 06 Posts: 9 Credit: 2,413,908 RAC: 0 |
I have seen this as well. I didn\\\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\\\'t swear to that. I\\\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\\\'t help. But it is pretty infrequent which of course means hard to troubleshoot... Looks like yesterday it happened to me as well. AdeB |
Send message Joined: 1 Mar 07 Posts: 47 Credit: 32,356 RAC: 0 |
I have seen this as well. I didn\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\'t swear to that. I\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\'t help. But it is pretty infrequent which of course means hard to troubleshoot... I don\'t know Linux well, but it seems to me that this could be a BOINC issue which only occurs under certain circumstances rather than a LHC@home issue. Has this occured on any other project which cancels redundant results ? I know that SETI Beta runs a IR=3 Q=2 policy on MultiBeam (not AstroPulse) WU\'s, so it may be worth while attaching to that project as a test, to see if this occurs on other BOINC projects. It may also be useful to post about this on BOINC message boards, or raise a TRAC Ticket. Keith. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have seen this as well. I didn\\\'t really suspect LHC but now that you mention it, I think it has only happened while LHC has had work... but I won\\\'t swear to that. I\\\'ve seen it on gentoo with BOINC 5.10. BOINC is installed through portage so it was compiled on the box it is running on. I even tried reinstalling (which means recompiling) in case some underlying library changed but that didn\\\'t help. But it is pretty infrequent which of course means hard to troubleshoot... It seems this happens also on other projects when the client asks for works and receives none. See this on the BOINC message boards: BOINC client exits Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
It seems this happens also on other projects when the client asks for works and receives none. See this on the BOINC message boards: Can't recall. Will try to connect. Tullio No, it is not sending that message, only this: Scheduler request succeeded, got 0 new tasks |
Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0 |
Maybe some clues will turn up. Col. Mustard in the Ballroom with the Candlestick... ;-) In all seriousness, has anyone considered falling back to a 5.8.x version of BOINC and seeing if the problem is still there or not? |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I restarted lhc@home and shall report. I am still thinking that the problem arises when the server deletes a redundant result. Tullio |
©2024 CERN