Message boards : Number crunching : I think this cruncher has a problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile clownius
Avatar

Send message
Joined: 1 May 06
Posts: 34
Credit: 64,492
RAC: 0
Message 15892 - Posted: 24 Dec 2006, 18:21:07 UTC

http://lhcathome.cern.ch/show_host_detail.php?hostid=3746202
For one it seems to be claiming excessive credit but thats not the real issue. It seems to have a lot of Wu's under 3 seconds crunch time that have gone into pending when other computers are running for a couple of hours.
Is there a way someone can warn the owner they have some sorta issue?
ID: 15892 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 15893 - Posted: 24 Dec 2006, 18:25:50 UTC

Main issue I see is he has more than a hundred In Progress results. That's where a cache limit would be nice... Next time somebody can't get any work, remember those people who grab 200 at a time.
ID: 15893 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 15895 - Posted: 24 Dec 2006, 21:20:56 UTC - in response to Message 15892.  

http://lhcathome.cern.ch/show_host_detail.php?hostid=3746202
For one it seems to be claiming excessive credit but thats not the real issue. It seems to have a lot of Wu's under 3 seconds crunch time that have gone into pending when other computers are running for a couple of hours.
Is there a way someone can warn the owner they have some sorta issue?



Only 180 wu's pending. That host has an allowance of 500 wu's per cpu.

Only problem I see is that 350 or so wu's back, that the boinc client mixed up the projects, and you see some burp (big ugly rendering project) units instead of LHC work units.
I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 15895 · Report as offensive     Reply Quote
Profile Shann

Send message
Joined: 18 Jul 05
Posts: 3
Credit: 336,255
RAC: 0
Message 15899 - Posted: 25 Dec 2006, 9:15:51 UTC
Last modified: 25 Dec 2006, 9:16:41 UTC

http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215

One of my 'puters have the same problem with Chess960 .
Where does this problem come from ??
ID: 15899 · Report as offensive     Reply Quote
Ledi

Send message
Joined: 7 Mar 06
Posts: 9
Credit: 9,816
RAC: 0
Message 15900 - Posted: 25 Dec 2006, 9:33:00 UTC - in response to Message 15899.  

http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215
One of my 'puters have the same problem with Chess960 .
Where does this problem come from ??

even stranger, Burp and Chess results in 1 WU http://lhcathome.cern.ch/workunit.php?wuid=1562745


I am Homer of Borg. Prepare to be ...ooooh donuts!



ID: 15900 · Report as offensive     Reply Quote
Ledi

Send message
Joined: 7 Mar 06
Posts: 9
Credit: 9,816
RAC: 0
Message 15901 - Posted: 25 Dec 2006, 11:37:45 UTC

Another cruncher having problems with short LHC-units that have results mixed with other projects (seti) :-(
http://lhcathome.cern.ch/forum_thread.php?id=2386.

I am Homer of Borg. Prepare to be ...ooooh donuts!



ID: 15901 · Report as offensive     Reply Quote
Profile clownius
Avatar

Send message
Joined: 1 May 06
Posts: 34
Credit: 64,492
RAC: 0
Message 15904 - Posted: 25 Dec 2006, 17:19:55 UTC

Im at a loss as to how this could happen? How can you get seti crunching a LHC work unit? Is it people trying to optimize or something?
Mainly i wish people wouldn't hide their computers so we could let em know somethings wrong.
ID: 15904 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 15910 - Posted: 25 Dec 2006, 21:23:26 UTC

I think the fix is to reset all your projects.


I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 15910 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 15911 - Posted: 25 Dec 2006, 21:30:15 UTC - in response to Message 15900.  

http://lhcathome.cern.ch/show_host_detail.php?hostid=2786215
One of my 'puters have the same problem with Chess960 .
Where does this problem come from ??

even stranger, Burp and Chess results in 1 WU http://lhcathome.cern.ch/workunit.php?wuid=1562745



More interesting is that Boinc can get borked up and still validate.
http://lhcathome.cern.ch/workunit.php?wuid=1562764
(result id 8082119)

I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 15911 · Report as offensive     Reply Quote
Bob Guy

Send message
Joined: 28 Sep 05
Posts: 21
Credit: 11,715
RAC: 0
Message 15913 - Posted: 25 Dec 2006, 21:57:40 UTC - in response to Message 15911.  

More interesting is that Boinc can get borked up and still validate.
http://lhcathome.cern.ch/workunit.php?wuid=1562764
(result id 8082119)

My guess it that it's just the stderr out that's messed up, it could be a Windows or harddisk file system problem. I've had my stderr out file (and other Boinc files) fill up with garbage that doesn't even belong to any Boinc project. I've never been able to determine why that happens, I blame it on Windows. The result is probably OK if it validates.
ID: 15913 · Report as offensive     Reply Quote
Profile Shann

Send message
Joined: 18 Jul 05
Posts: 3
Credit: 336,255
RAC: 0
Message 15916 - Posted: 26 Dec 2006, 15:44:20 UTC

The problem is that it's my computer at work.
I'm on holiday for the moment so i will wait until next year to solve the problem.


ID: 15916 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 15917 - Posted: 26 Dec 2006, 22:23:02 UTC - in response to Message 15916.  

The problem is that it's my computer at work.
I'm on holiday for the moment so i will wait until next year to solve the problem.



You may find that the boinc client ends up sorting itself out after time.
I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 15917 · Report as offensive     Reply Quote
Profile Ocean Archer
Avatar

Send message
Joined: 13 Jul 05
Posts: 143
Credit: 263,300
RAC: 0
Message 15918 - Posted: 26 Dec 2006, 22:47:21 UTC
Last modified: 26 Dec 2006, 22:48:21 UTC

As I understand the way the system works, when the WUs do not show up on time from this machine, they will be redistributed and sent out again. This machine will have it's number of WUs reduced drastically, and it will take a few weeks of "on-time" returns before that number is increased -- glad my machines aren't this greedy ...


If I've lived this long, I've gotta be that old
ID: 15918 · Report as offensive     Reply Quote
miketoth1001
Avatar

Send message
Joined: 16 Dec 06
Posts: 1
Credit: 213
RAC: 0
Message 15941 - Posted: 30 Dec 2006, 19:44:58 UTC

Please, also look at this host. It's clocking in, with validated results, at under 1 minute per WU. It's done QUITE a lot of WUs, just like this. It's getting 0 credits, though, which is a good thing.

Here's one of its results, minus the HUGE stderr out text.

Result ID 8209399
Name ws2m2._0.08_s2m2._0.08__1__64.3125_59.3225__4_6__6__70_1_sixvf_boinc322623_4
Workunit 1587742
Created 29 Dec 2006 1:19:59 UTC
Sent 29 Dec 2006 21:32:47 UTC
Received 29 Dec 2006 22:57:25 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 88058
Report deadline 5 Jan 2007 13:05:01 UTC
CPU time 53.4219
stderr out

<core_client_version>5.4.11</core_client_version>

Validate state Initial
Claimed credit 0.0945038770069055
Granted credit 0
application version 4.67



ID: 15941 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 15942 - Posted: 30 Dec 2006, 20:05:20 UTC - in response to Message 15941.  

Please, also look at this host. It's clocking in, with validated results, at under 1 minute per WU. It's done QUITE a lot of WUs, just like this. It's getting 0 credits, though, which is a good thing.

His results aren't validating... Validate state was sometimes Invalid and sometimes Initial. Initial means there aren't enough other results to validate yet (or for some other reason it hasn't validated yet, like validator down).
ID: 15942 · Report as offensive     Reply Quote
Profile Ocean Archer
Avatar

Send message
Joined: 13 Jul 05
Posts: 143
Credit: 263,300
RAC: 0
Message 15943 - Posted: 30 Dec 2006, 23:10:33 UTC

The bad thing is, that all his efforts are going to waste - what a shame. He ought to stop all his processes, then stop BOINC, and do a complete reinstallation of the BOINC software. Then he needs to go back, and reattach to the various projects he donates his time.


If I've lived this long, I've gotta be that old
ID: 15943 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 15945 - Posted: 31 Dec 2006, 6:40:58 UTC
Last modified: 31 Dec 2006, 6:43:36 UTC

This messing up two results from different projects might be caused by a BOINC bug in the XML handler.

If there is no CR and/or LF behind an end tag, the next tag is unrecognized and all the stuff from the next workunit belongs to the previous workunit now.

<workunit>
<tag>
contents1
</tag></workunit> ! this workunit end tag gets lost

<workunit> ! this tag is unexpected, as there's still a workunit tag open
<tag>
contents2
</tag>
</workunit> ! this closes the first(!!!) workunit start tag

Spinhenge had exactly that problem with one of the first program versions.

As long as the XML handler is unable to handle more than one tag per line, a solution would be to force a linefeed behind all contents that get included into client_state.xml in order to send them with the result report (stdout stuff and so) but that will not fix it for existing clients of course.
ID: 15945 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 15950 - Posted: 31 Dec 2006, 16:26:35 UTC

The documentation says tags should be in separate lines. I have complained more than once for BOINC using their own XML parser instead of something already done like libxml2. For example, they only recently fixed a bug where <tag/> worked but <tag /> didn't... (note the space)
ID: 15950 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 16290 - Posted: 11 Feb 2007, 17:09:52 UTC
Last modified: 11 Feb 2007, 17:12:04 UTC

Tags in different lines within your XML - but what about a missing CR at the end of the XML code that BOINC imports from a finished WU into client_state.xml? In this case BOINC itself appends the first tag of the next block directly behind the last tag of the imported XML.

I think this is really something BOINC could take care for, especially as the project causing it is not necessarily the same as the project that is damaged by it.
ID: 16290 · Report as offensive     Reply Quote
Fritz
Avatar

Send message
Joined: 4 Sep 05
Posts: 13
Credit: 536,862
RAC: 0
Message 16463 - Posted: 7 Mar 2007, 0:43:51 UTC

Not sure if it's a related error, but after updating to Boinc Manager 5.8.11 I had Primegrid creating a new CPID & host computer entry daily. Also managed to connect the same machine, using the same email to that project twice so I had two different PG entries in my projects list.

Detatching one of the PGs cleared up the problem, but was a little strange while it was going on.
ID: 16463 · Report as offensive     Reply Quote

Message boards : Number crunching : I think this cruncher has a problem


©2024 CERN