Message boards :
Number crunching :
Discepency in Credits?
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jan 06 Posts: 2 Credit: 1,137 RAC: 0 |
There appears to be an inherent bug in the way in which BOINC calculates credit! I posted in the Seti@Home forums a discrepency between the Credits granted and the sum of the HOSTS. http://setiathome.berkeley.edu/forum_thread.php?id=31939 This discrepency also appears to affect LHC I have a shortfall of approx. 1205 Credits! Anyone else suffering from this error in calculation? |
Send message Joined: 26 Sep 05 Posts: 85 Credit: 421,130 RAC: 0 |
My host tends to have more. Last time this happened was also due to the last time we had server probs, which was evened up when Churlle fixed it the last time. Best I can figure, is something like this is going on. There's evidence of something seriously wrong in the dBase (for instance, click on certain sections like pending credit, teams, whatever, and get some error message wrt the operaton taking place). So, the host reports a WU back, without a database association to say which account owns the host, the credit gets applied to the host, but not the owning account. Ditto on anything getting applied to team accounts, as they aren't even comming up anymore. If the dBase can't find out what computer belongs where, all these orphaned records/computers can end up gaining credit, without it applying elsewhere. I'm not 100% positive how they have it setup, but there might be a host_computer table in the dBase which is seperate from the user table? If that's the case, the dBase would have to be able to link these tables together to make the necessary applications of credit; and performing lookups on some tables (aka pending credit and teams) seems to be what invokes the error with a 0.00 given for a result. Or perhaps I should modify that. The computers aren't completely orphaned. They still do show up under our account as being our computers. It just seems an operation done to them credit wise, isn't reciprically being done to the comp account that owns them during such a dBase crash... |
Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,691,526 RAC: 383 |
If the hosts are showing up under your account then the validator knows who the host belongs to. It uses the same foreign key for both lookups. It is more likely that this is the same bug that was just discovered over at seti and appears to be a bug in the basic BOINC validator code. It has to do with timing. Say results 1 and 2 are reported at the same time by user "bob". There are 2 validator processes running. Each one of them grabs one of the results and validates it. Validator 1 does a lookup on the user table and sees that user bob has x credits. At the same time, validator 2 looks up the same information and also determines that bob has x credits. Then validator 1 adds the credit for result 1 to x and saves x+1 back to the user table. Then validator 2 adds the credit for result 2 and saves that to the user table (x+2) however this overwrites the x+1 that validator 1 wrote right before so in the end the user only gets credit for x+2 instead of x+2+1 The solution is to make the operations atomic so that validator 2 can't read how much credit bob has until validator 1 has finished adding and saving the data. Of course none of this matters right now since the results table seems to be totally screwed up :) - A member of The Knights Who Say NI! My BOINC stats site |
Send message Joined: 26 Sep 05 Posts: 85 Credit: 421,130 RAC: 0 |
Keep in mind however, that since the database crash, my computer's copy of the cedit has been increasing, but the value in the user table has not. AKA one has been going up, and the other is the same. Also, things can be orphaned in 1 direction, aka if it is able to link the tables in 1 direction, but for whatever reason not the other. Yes, it is reasonable the same foreign key is used, and logically it should be able to go both ways. This of course is assuming that everything is working correctly. What could be going on, is that a result is being reported by a given host, and so there's the record of the host being made through the connection with the dBase itself. However, if in the process of trying to update the user, the dBase is getting the same sorta error we are getting, such as Warning: mysql_fetch_object(): supplied argument is not a valid MySQL result resource in /shift/lxfsrk4101/data01/projects/lhcathome/html/user/results.php on line 41 then it might be unable to proceed any further, so stops. One other thing to keep in mind. The results table needs to be updated, along with some other tables potentially. The user table doesn't need an update on some fields, unless an account is created, the cross-project ID is changed (aka after being created, BOINC contacts and supplies a different, earlier one), etc. It's the credits that need a recipracle update. Computers is another one of those things. Which of course brings up the possible question if one of the validator's is running into the same "not valid argument" we are, and if so; is simply unable to complete an operation. I point this out, because there might or might not even be an attempt to update the user table at this point; in which case we could be looking at old data, rather then current (aka some things being presented to us has simply been cached). Not entirely positive, as I haven't looked at LHC's dBase setup itself; though as another sign of this whole dBase crash, LHC hasn't exported XML files for stat sites like (BOINCstats) to pick up in awhile either. http://www.boincstats.com/ LHC@Home 2006-06-21 18:45:06 GMT All in all, doesn't seem implausable that some parts of the dBase are still functioning (else we wouldn't even be getting to the forums, or be able to look at anything), and other parts have crashed. Nor does it seem implausable that if we are getting errors such as the above, that any internal operations that require those same tables or present similar arguments might not be getting the same error we get on our screen. And from there...? Could some of these figures such as user credit be cached somewhere anyhow, or does it have to be a recent copy. At the very least I have noticed in the past a lag time between when user credits are incremented, and when this increment shows up on their team page (which of course couldn't complete sucessfully, after the teams have disappeared). |
Send message Joined: 18 Sep 04 Posts: 23 Credit: 3,304 RAC: 0 |
I told Willy at BOINCstats about LHC problems and he probably disabled importing stats from the project to avoid corrupted data. See here But indeed files in http://lhcathome.cern.ch/stats/ are dated Jul 21st 2006 |
Send message Joined: 29 Sep 04 Posts: 196 Credit: 207,040 RAC: 0 |
To throw in my two cents: Could the problem LHC is experiencing with multiple host-ids be contributing to this symptom? It seems it might make sense if the servers are no longer able to track which hosts a user has due to the hosts issue. |
©2024 CERN