Message boards :
Number crunching :
I think this cruncher has a problem
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 1 May 06 Posts: 34 Credit: 64,492 RAC: 0 |
http://lhcathome.cern.ch/show_host_detail.php?hostid=3746202 For one it seems to be claiming excessive credit but thats not the real issue. It seems to have a lot of Wu's under 3 seconds crunch time that have gone into pending when other computers are running for a couple of hours. Is there a way someone can warn the owner they have some sorta issue? ![]() ![]() ![]() |
![]() Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 ![]() |
Main issue I see is he has more than a hundred In Progress results. That's where a cache limit would be nice... Next time somebody can't get any work, remember those people who grab 200 at a time. |
![]() Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
http://lhcathome.cern.ch/show_host_detail.php?hostid=3746202 Only 180 wu's pending. That host has an allowance of 500 wu's per cpu. Only problem I see is that 350 or so wu's back, that the boinc client mixed up the projects, and you see some burp (big ugly rendering project) units instead of LHC work units. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
![]() Send message Joined: 18 Jul 05 Posts: 3 Credit: 336,255 RAC: 0 ![]() ![]() |
http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215 One of my 'puters have the same problem with Chess960 . Where does this problem come from ?? |
Send message Joined: 7 Mar 06 Posts: 9 Credit: 9,816 RAC: 0 |
http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215 even stranger, Burp and Chess results in 1 WU http://lhcathome.cern.ch/workunit.php?wuid=1562745 I am Homer of Borg. Prepare to be ...ooooh donuts! ![]() ![]() ![]() |
Send message Joined: 7 Mar 06 Posts: 9 Credit: 9,816 RAC: 0 |
Another cruncher having problems with short LHC-units that have results mixed with other projects (seti) :-( http://lhcathome.cern.ch/forum_thread.php?id=2386. I am Homer of Borg. Prepare to be ...ooooh donuts! ![]() ![]() ![]() |
![]() ![]() Send message Joined: 1 May 06 Posts: 34 Credit: 64,492 RAC: 0 |
|
![]() Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
I think the fix is to reset all your projects. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
![]() Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215 More interesting is that Boinc can get borked up and still validate. http://lhcathome.cern.ch/workunit.php?wuid=1562764 (result id 8082119) I'm not the LHC Alex. Just a number cruncher like everyone else here. |
Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0 |
More interesting is that Boinc can get borked up and still validate. My guess it that it's just the stderr out that's messed up, it could be a Windows or harddisk file system problem. I've had my stderr out file (and other Boinc files) fill up with garbage that doesn't even belong to any Boinc project. I've never been able to determine why that happens, I blame it on Windows. The result is probably OK if it validates. |
![]() Send message Joined: 18 Jul 05 Posts: 3 Credit: 336,255 RAC: 0 ![]() ![]() |
The problem is that it's my computer at work. I'm on holiday for the moment so i will wait until next year to solve the problem. |
![]() Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
The problem is that it's my computer at work. You may find that the boinc client ends up sorting itself out after time. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
![]() ![]() Send message Joined: 13 Jul 05 Posts: 143 Credit: 263,300 RAC: 0 |
As I understand the way the system works, when the WUs do not show up on time from this machine, they will be redistributed and sent out again. This machine will have it's number of WUs reduced drastically, and it will take a few weeks of "on-time" returns before that number is increased -- glad my machines aren't this greedy ... If I've lived this long, I've gotta be that old |
![]() Send message Joined: 16 Dec 06 Posts: 1 Credit: 213 RAC: 0 |
Please, also look at this host. It's clocking in, with validated results, at under 1 minute per WU. It's done QUITE a lot of WUs, just like this. It's getting 0 credits, though, which is a good thing. Here's one of its results, minus the HUGE stderr out text. Result ID 8209399 ![]() |
![]() Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 ![]() |
Please, also look at this host. It's clocking in, with validated results, at under 1 minute per WU. It's done QUITE a lot of WUs, just like this. It's getting 0 credits, though, which is a good thing. His results aren't validating... Validate state was sometimes Invalid and sometimes Initial. Initial means there aren't enough other results to validate yet (or for some other reason it hasn't validated yet, like validator down). |
![]() ![]() Send message Joined: 13 Jul 05 Posts: 143 Credit: 263,300 RAC: 0 |
The bad thing is, that all his efforts are going to waste - what a shame. He ought to stop all his processes, then stop BOINC, and do a complete reinstallation of the BOINC software. Then he needs to go back, and reattach to the various projects he donates his time. If I've lived this long, I've gotta be that old |
![]() Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
This messing up two results from different projects might be caused by a BOINC bug in the XML handler. If there is no CR and/or LF behind an end tag, the next tag is unrecognized and all the stuff from the next workunit belongs to the previous workunit now. <workunit> <tag> contents1 </tag></workunit> ! this workunit end tag gets lost <workunit> ! this tag is unexpected, as there's still a workunit tag open <tag> contents2 </tag> </workunit> ! this closes the first(!!!) workunit start tag Spinhenge had exactly that problem with one of the first program versions. As long as the XML handler is unable to handle more than one tag per line, a solution would be to force a linefeed behind all contents that get included into client_state.xml in order to send them with the result report (stdout stuff and so) but that will not fix it for existing clients of course. |
![]() Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 ![]() |
The documentation says tags should be in separate lines. I have complained more than once for BOINC using their own XML parser instead of something already done like libxml2. For example, they only recently fixed a bug where <tag/> worked but <tag /> didn't... (note the space) |
![]() Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
Tags in different lines within your XML - but what about a missing CR at the end of the XML code that BOINC imports from a finished WU into client_state.xml? In this case BOINC itself appends the first tag of the next block directly behind the last tag of the imported XML. I think this is really something BOINC could take care for, especially as the project causing it is not necessarily the same as the project that is damaged by it. |
![]() Send message Joined: 4 Sep 05 Posts: 13 Credit: 536,862 RAC: 0 ![]() ![]() |
Not sure if it's a related error, but after updating to Boinc Manager 5.8.11 I had Primegrid creating a new CPID & host computer entry daily. Also managed to connect the same machine, using the same email to that project twice so I had two different PG entries in my projects list. Detatching one of the PGs cleared up the problem, but was a little strange while it was going on. |
©2025 CERN