Message boards :
Number crunching :
Validator Problem
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Please see this wu -> http://lxfsrk4101.cern.ch/workunit.php?wuid=135 With the way ther validator worked it took a successful return of 0 seconds as a valid result whereas the other 2 wu's took 4,400 seconds and were also successful. The higher claimed credit was dropped and the mid one was added to 0 and divided by 2 to give a granted credit of only half the claimed or 3.39. This is not the only wu with this problem http://lxfsrk4101.cern.ch/workunit.php?wuid=134 http://lxfsrk4101.cern.ch/workunit.php?wuid=132 Live long and crunch! Paul (S@H1 8888) ![]() ![]() |
![]() Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
I checked these results and they were, in fact, identical. I think the problem is in reporting of the used CPU time rather than actual computation or validatation. After all, it does take a few fractions of seconds to make sixtrack output even a single result output file. The 0-second result was computed by some Opteron/Linux machine and beta-version core client (4.66). Maybe that combination explains the problem. Markku Degerholm LHC@home admin |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Thanks...but ouch! There will not be happy crunchers out there if this continues. Paul (S@H1 8888) ![]() ![]() |
![]() Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
There seems to be more of these 0-second results than I thought... Even my own computer produces them now and then. Will investigate more.... Markku Degerholm LHC@home admin |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Oh this sucks; http://lxfsrk4101.cern.ch/workunit.php?wuid=245 http://lxfsrk4101.cern.ch/workunit.php?wuid=244 Paul. The Final Fornt Ear. |
![]() Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0 |
Add http://lxfsrk4101.cern.ch/workunit.php?wuid=263 as one of the Opteron = 0 ones, but see that a Pentium 3 does the same. And http://lxfsrk4101.cern.ch/workunit.php?wuid=256 Jord BOINC FAQ Service |
![]() Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
The problem is quite weird. There is no recognisable pattern of which platforms or clients have this problem. Anyway, the problem seems to be client-related, but you never know... Investigations continue. Markku Degerholm LHC@home admin |
Send message Joined: 12 Feb 05 Posts: 4 Credit: 94,837 RAC: 0 |
Note that http://lxfsrk4101.cern.ch/results.php?hostid=20798 has a lot of 0 credit WUs (as do several of mine). This is a 2.8 GHZ Pentium 4 running Linux. Another odd thing is that for the nonzero WUs, the CPU times are still pretty low. For example on, http://lxfsrk4101.cern.ch/workunit.php?wuid=245 this machine reported 2,023.23 CPU seconds, but a faster machine running win XP ( http://lxfsrk4101.cern.ch/show_host_detail.php?hostid=20790 ) claims more than 20 times the amount of CPU time. Something is way off. -ttocs |
![]() Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
This problem has happened on a few projects. It has something to do with the app not checkpointing properly in certain situations if I remember correctly. Hopefully someone with more accurate/complete info will respond. BOINC WIKI ![]() ![]() BOINCing since 2002/12/8 |
![]() Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0 |
> This problem has happened on a few projects. It has something to do with the > app not checkpointing properly in certain situations if I remember correctly. > Isn't that the self compiled Linux client that you are referring to? I mean doesn't that run at least lower benchmarks than the original? Jord BOINC FAQ Service |
Send message Joined: 12 Feb 05 Posts: 4 Credit: 94,837 RAC: 0 |
I've had this issue with the regular alpha core client, not just self compiled ones. I don't know if it occurs with the released client, though. I have seen the CPU and credit misreported on Predictor WUs, though the issue seems more pronounced with LHC. Since the alpha core client (4.19, 4.61, 4.62, 4.20, 4.21) switches among projects more, the conjecture of a checkpointing issue makes some amount of sense. If the CPU info is forgotten between switches, this could be an explanation. |
![]() Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
I think it was einstein that had the big problem with this, at least most recently. The temporary workaround until they got the app fixed was to leave work in memory. Is there any observed pattern related to that here? My only returned workunit claimed proper credit and I remove work from memory. BOINC WIKI ![]() ![]() BOINCing since 2002/12/8 |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Only a linux problem? Two out of three returnees were linux and both had 0 cpu time. Damn that hurts. http://lxfsrk4101.cern.ch/workunit.php?wuid=249 Paul (S@H1 8888) ![]() ![]() |
Send message Joined: 2 Sep 04 Posts: 28 Credit: 44,344 RAC: 0 |
> Only a linux problem? Two out of three returnees were linux and both had 0 > cpu time. Damn that hurts. Don't think it's only a linux problem. http://lxfsrk4101.cern.ch/workunit.php?wuid=322 Two linux returnees, one 0 CPU the other (mine) 149,745.07 CPU. Same kernel revisions (Linux 2.6.10). I notice that the project is scheduled to switch on Monday. Perhaps this problem needs to be found and fixed first? In any event, I've detached my boxes in anticipation, primarily to prevent any possible problems with project url's once the servers switch. See you all on the other side... ;-) |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
I agree with ralic. If this problem makes it out to the public there will be a lot of disgruntled crunchers giving admin a lot of hassle. Here is another one http://lxfsrk4101.cern.ch/workunit.php?wuid=250 That is a lot of credit down the tubes. Live long and crunch. Paul (S@H1 8888) ![]() ![]() |
![]() Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0 |
I agree as well. Here is another one, another Opteron doing it: http://lxfsrk4101.cern.ch/workunit.php?wuid=355 Jord BOINC FAQ Service |
©2025 CERN