Message boards : Number crunching : Validator Problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 5629 - Posted: 14 Feb 2005, 11:29:22 UTC

Please see this wu -> http://lxfsrk4101.cern.ch/workunit.php?wuid=135

With the way ther validator worked it took a successful return of 0 seconds as a valid result whereas the other 2 wu's took 4,400 seconds and were also successful. The higher claimed credit was dropped and the mid one was added to 0 and divided by 2 to give a granted credit of only half the claimed or 3.39.

This is not the only wu with this problem

http://lxfsrk4101.cern.ch/workunit.php?wuid=134
http://lxfsrk4101.cern.ch/workunit.php?wuid=132

Live long and crunch!

Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 5629 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 5630 - Posted: 14 Feb 2005, 14:14:25 UTC

I checked these results and they were, in fact, identical. I think the problem is in reporting of the used CPU time rather than actual computation or validatation. After all, it does take a few fractions of seconds to make sixtrack output even a single result output file.

The 0-second result was computed by some Opteron/Linux machine and beta-version core client (4.66). Maybe that combination explains the problem.

Markku Degerholm
LHC@home admin
ID: 5630 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 5636 - Posted: 14 Feb 2005, 20:33:00 UTC

Thanks...but ouch! There will not be happy crunchers out there if this continues.
Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 5636 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 5637 - Posted: 14 Feb 2005, 20:42:58 UTC
Last modified: 14 Feb 2005, 20:43:07 UTC

There seems to be more of these 0-second results than I thought... Even my own computer produces them now and then. Will investigate more....

Markku Degerholm
LHC@home admin
ID: 5637 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 5644 - Posted: 16 Feb 2005, 11:28:17 UTC

Oh this sucks;

http://lxfsrk4101.cern.ch/workunit.php?wuid=245
http://lxfsrk4101.cern.ch/workunit.php?wuid=244

Paul.
The Final Fornt Ear.
ID: 5644 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 5645 - Posted: 16 Feb 2005, 14:35:51 UTC
Last modified: 16 Feb 2005, 15:15:58 UTC

Add http://lxfsrk4101.cern.ch/workunit.php?wuid=263 as one of the Opteron = 0 ones, but see that a Pentium 3 does the same.

And http://lxfsrk4101.cern.ch/workunit.php?wuid=256
Jord

BOINC FAQ Service
ID: 5645 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 5646 - Posted: 16 Feb 2005, 19:54:38 UTC

The problem is quite weird. There is no recognisable pattern of which platforms or clients have this problem. Anyway, the problem seems to be client-related, but you never know... Investigations continue.

Markku Degerholm
LHC@home admin
ID: 5646 · Report as offensive     Reply Quote
ttocs

Send message
Joined: 12 Feb 05
Posts: 4
Credit: 94,837
RAC: 0
Message 5647 - Posted: 16 Feb 2005, 20:52:16 UTC

Note that http://lxfsrk4101.cern.ch/results.php?hostid=20798 has a lot of 0 credit WUs (as do several of mine). This is a 2.8 GHZ Pentium 4 running Linux. Another odd thing is that for the nonzero WUs, the CPU times are still pretty low. For example on, http://lxfsrk4101.cern.ch/workunit.php?wuid=245 this machine reported
2,023.23 CPU seconds, but a faster machine running win XP ( http://lxfsrk4101.cern.ch/show_host_detail.php?hostid=20790 ) claims more than 20 times the amount of CPU time. Something is way off.

-ttocs
ID: 5647 · Report as offensive     Reply Quote
Profile Keck_Komputers

Send message
Joined: 1 Sep 04
Posts: 275
Credit: 2,652,452
RAC: 0
Message 5648 - Posted: 16 Feb 2005, 23:17:47 UTC

This problem has happened on a few projects. It has something to do with the app not checkpointing properly in certain situations if I remember correctly. Hopefully someone with more accurate/complete info will respond.
BOINC WIKI

BOINCing since 2002/12/8
ID: 5648 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 5649 - Posted: 17 Feb 2005, 3:30:11 UTC - in response to Message 5648.  
Last modified: 17 Feb 2005, 3:30:59 UTC

> This problem has happened on a few projects. It has something to do with the
> app not checkpointing properly in certain situations if I remember correctly.
>

Isn't that the self compiled Linux client that you are referring to?
I mean doesn't that run at least lower benchmarks than the original?
Jord

BOINC FAQ Service
ID: 5649 · Report as offensive     Reply Quote
ttocs

Send message
Joined: 12 Feb 05
Posts: 4
Credit: 94,837
RAC: 0
Message 5650 - Posted: 17 Feb 2005, 4:03:48 UTC

I've had this issue with the regular alpha core client, not just self compiled ones. I don't know if it occurs with the released client, though. I have seen the CPU and credit misreported on Predictor WUs, though the issue seems more pronounced with LHC.

Since the alpha core client (4.19, 4.61, 4.62, 4.20, 4.21) switches among projects more, the conjecture of a checkpointing issue makes some amount of sense. If the CPU info is forgotten between switches, this could be an explanation.


ID: 5650 · Report as offensive     Reply Quote
Profile Keck_Komputers

Send message
Joined: 1 Sep 04
Posts: 275
Credit: 2,652,452
RAC: 0
Message 5652 - Posted: 17 Feb 2005, 9:51:51 UTC

I think it was einstein that had the big problem with this, at least most recently. The temporary workaround until they got the app fixed was to leave work in memory. Is there any observed pattern related to that here?

My only returned workunit claimed proper credit and I remove work from memory.
BOINC WIKI

BOINCing since 2002/12/8
ID: 5652 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 5656 - Posted: 18 Feb 2005, 13:23:06 UTC

Only a linux problem? Two out of three returnees were linux and both had 0 cpu time. Damn that hurts.

http://lxfsrk4101.cern.ch/workunit.php?wuid=249


Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 5656 · Report as offensive     Reply Quote
ralic

Send message
Joined: 2 Sep 04
Posts: 28
Credit: 44,344
RAC: 0
Message 5661 - Posted: 19 Feb 2005, 8:02:49 UTC - in response to Message 5656.  

> Only a linux problem? Two out of three returnees were linux and both had 0
> cpu time. Damn that hurts.

Don't think it's only a linux problem.
http://lxfsrk4101.cern.ch/workunit.php?wuid=322

Two linux returnees, one 0 CPU the other (mine) 149,745.07 CPU. Same kernel revisions (Linux 2.6.10).

I notice that the project is scheduled to switch on Monday. Perhaps this problem needs to be found and fixed first?

In any event, I've detached my boxes in anticipation, primarily to prevent any possible problems with project url's once the servers switch.

See you all on the other side... ;-)
ID: 5661 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 5663 - Posted: 19 Feb 2005, 21:14:20 UTC

I agree with ralic. If this problem makes it out to the public there will be a lot of disgruntled crunchers giving admin a lot of hassle.

Here is another one

http://lxfsrk4101.cern.ch/workunit.php?wuid=250

That is a lot of credit down the tubes.

Live long and crunch.




Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 5663 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 5670 - Posted: 20 Feb 2005, 17:24:58 UTC

I agree as well.

Here is another one, another Opteron doing it: http://lxfsrk4101.cern.ch/workunit.php?wuid=355
Jord

BOINC FAQ Service
ID: 5670 · Report as offensive     Reply Quote

Message boards : Number crunching : Validator Problem


©2024 CERN