21) Message boards : Number crunching : Discepency in Credits? (Message 14155)
Posted 23 Jun 2006 by Nuadormrac
Post:
Keep in mind however, that since the database crash, my computer's copy of the cedit has been increasing, but the value in the user table has not. AKA one has been going up, and the other is the same.

Also, things can be orphaned in 1 direction, aka if it is able to link the tables in 1 direction, but for whatever reason not the other. Yes, it is reasonable the same foreign key is used, and logically it should be able to go both ways. This of course is assuming that everything is working correctly.

What could be going on, is that a result is being reported by a given host, and so there's the record of the host being made through the connection with the dBase itself. However, if in the process of trying to update the user, the dBase is getting the same sorta error we are getting, such as

Warning: mysql_fetch_object(): supplied argument is not a valid MySQL result resource in /shift/lxfsrk4101/data01/projects/lhcathome/html/user/results.php on line 41

Warning: mysql_free_result(): supplied argument is not a valid MySQL result resource in /shift/lxfsrk4101/data01/projects/lhcathome/html/user/results.php on line 45


then it might be unable to proceed any further, so stops.

One other thing to keep in mind. The results table needs to be updated, along with some other tables potentially. The user table doesn't need an update on some fields, unless an account is created, the cross-project ID is changed (aka after being created, BOINC contacts and supplies a different, earlier one), etc. It's the credits that need a recipracle update. Computers is another one of those things. Which of course brings up the possible question if one of the validator's is running into the same "not valid argument" we are, and if so; is simply unable to complete an operation.

I point this out, because there might or might not even be an attempt to update the user table at this point; in which case we could be looking at old data, rather then current (aka some things being presented to us has simply been cached). Not entirely positive, as I haven't looked at LHC's dBase setup itself; though as another sign of this whole dBase crash, LHC hasn't exported XML files for stat sites like (BOINCstats) to pick up in awhile either.

http://www.boincstats.com/

LHC@Home 2006-06-21 18:45:06 GMT
2 days 03:59:00 old


All in all, doesn't seem implausable that some parts of the dBase are still functioning (else we wouldn't even be getting to the forums, or be able to look at anything), and other parts have crashed. Nor does it seem implausable that if we are getting errors such as the above, that any internal operations that require those same tables or present similar arguments might not be getting the same error we get on our screen. And from there...?

Could some of these figures such as user credit be cached somewhere anyhow, or does it have to be a recent copy. At the very least I have noticed in the past a lag time between when user credits are incremented, and when this increment shows up on their team page (which of course couldn't complete sucessfully, after the teams have disappeared).
22) Message boards : Number crunching : Discepency in Credits? (Message 14142)
Posted 23 Jun 2006 by Nuadormrac
Post:
My host tends to have more. Last time this happened was also due to the last time we had server probs, which was evened up when Churlle fixed it the last time.

Best I can figure, is something like this is going on. There's evidence of something seriously wrong in the dBase (for instance, click on certain sections like pending credit, teams, whatever, and get some error message wrt the operaton taking place).

So, the host reports a WU back, without a database association to say which account owns the host, the credit gets applied to the host, but not the owning account. Ditto on anything getting applied to team accounts, as they aren't even comming up anymore. If the dBase can't find out what computer belongs where, all these orphaned records/computers can end up gaining credit, without it applying elsewhere.

I'm not 100% positive how they have it setup, but there might be a host_computer table in the dBase which is seperate from the user table? If that's the case, the dBase would have to be able to link these tables together to make the necessary applications of credit; and performing lookups on some tables (aka pending credit and teams) seems to be what invokes the error with a 0.00 given for a result.

Or perhaps I should modify that. The computers aren't completely orphaned. They still do show up under our account as being our computers. It just seems an operation done to them credit wise, isn't reciprically being done to the comp account that owns them during such a dBase crash...
23) Message boards : Number crunching : Where all the Teams Go? :( (Message 14134)
Posted 23 Jun 2006 by Nuadormrac
Post:
thx for the update
24) Message boards : Number crunching : Where all the Teams Go? :( (Message 14124)
Posted 23 Jun 2006 by Nuadormrac
Post:
They're having database problems. As of right now, my computer has more credit then my user account, as my user account hasn't gotten any granted credit added to it. From what I can gather, my computer was granted it... And yes, I noticed the sime mention with teams you have...

Perhaps if Churle notices this, he'll straighten this all out for us. He has been quite good at keepign things running in the past; albeit, and the thing is, it isn't really his job anymore as he had officially resigned sometime back.
25) Message boards : Number crunching : Server problems (Message 13676)
Posted 21 May 2006 by Nuadormrac
Post:
Most of my pending had cleared prior to this recent batch of WUs comming out, with only a few exceptions...

OK, the server probably got over-loaded as people were making a run to the server to mass download WUs :D Looking good though...
26) Message boards : Number crunching : Server problems (Message 13663)
Posted 19 May 2006 by Nuadormrac
Post:
thx for the fix... My granted credit has gone up, though I still have a lot of pending. Some WUs issued multiple instances though a quorum of 3 had already been attained (probably because the WUs weren't showing), and the WUs still being crunched seems to have gone up. But it does look things are back in business and returning to a state of normallacy :)

Edit: Did notice a few glitches however. On these WU, all crunchers were granted credit, but mine is still showing "status pending"

http://lhcathome.cern.ch/result.php?resultid=6912464
http://lhcathome.cern.ch/workunit.php?wuid=1340533

http://lhcathome.cern.ch/result.php?resultid=6917176
http://lhcathome.cern.ch/workunit.php?wuid=1341476

http://lhcathome.cern.ch/result.php?resultid=6887056
http://lhcathome.cern.ch/workunit.php?wuid=1335498

http://lhcathome.cern.ch/result.php?resultid=6874966
http://lhcathome.cern.ch/workunit.php?wuid=1333094

http://lhcathome.cern.ch/result.php?resultid=6891698
http://lhcathome.cern.ch/workunit.php?wuid=1336415

http://lhcathome.cern.ch/result.php?resultid=6893817
http://lhcathome.cern.ch/workunit.php?wuid=1336812

http://lhcathome.cern.ch/result.php?resultid=6918033
http://lhcathome.cern.ch/workunit.php?wuid=1341647

http://lhcathome.cern.ch/result.php?resultid=6864012
http://lhcathome.cern.ch/workunit.php?wuid=1330903

http://lhcathome.cern.ch/result.php?resultid=6900101
http://lhcathome.cern.ch/workunit.php?wuid=1338070

http://lhcathome.cern.ch/result.php?resultid=6879542
http://lhcathome.cern.ch/workunit.php?wuid=1333993

http://lhcathome.cern.ch/result.php?resultid=6898881
http://lhcathome.cern.ch/workunit.php?wuid=1337824

The other WUs in pending, are pending for all, but these have granted credit to all other users, but have it pending for me. Not sure if the validator will go back, and grant it to me, that the dBase is repaired, or not. Those are both the WU and result pairs...
27) Message boards : Number crunching : Server problems (Message 13660)
Posted 18 May 2006 by Nuadormrac
Post:
Chrulle certainly did put a lot into keeping this project up when he was here, and even a bit after he had left. Now the project people need to look into either getting someone, or doing something to keep this up themself. And yes, it would be a shame if all the WUs we had crunched from this batch got sent the way of the bit bucket or something...
28) Message boards : Number crunching : FUBAR! (Message 13650)
Posted 17 May 2006 by Nuadormrac
Post:
We don't have a permanent admin, though Churlle has stopped by on occassion. However, he's not working for this project anymore, and is no doubt quite busy with Africa@home (last I heard, he was looking into helping out with that project), and whatever else...

If it's down, it might be a good sign (as taking it down might be a first step to actually addressing some of the underlying issues), but don't know. Just hopeing this will sort itself out, without all the work and all having been lost...

Edit: Just got this back:

5/17/2006 1:16:50 PM|LHC@home|Started upload of file wfeb1A_v6s4vvnom_mqx__6__64.252_59.262__6_8__6__80_1_sixvf_boinc184259_2_0
5/17/2006 1:16:54 PM|LHC@home|Finished upload of file wfeb1A_v6s4vvnom_mqx__6__64.252_59.262__6_8__6__80_1_sixvf_boinc184259_2_0
5/17/2006 1:16:54 PM|LHC@home|Throughput 17536 bytes/sec
5/17/2006 1:16:58 PM|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
5/17/2006 1:16:58 PM|LHC@home|Reason: To report completed tasks
5/17/2006 1:16:58 PM|LHC@home|Reporting 1 tasks
5/17/2006 1:17:03 PM|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
29) Message boards : Number crunching : can't find workunit (Message 13640)
Posted 17 May 2006 by Nuadormrac
Post:
Yeah, I've got 16 WUs showing the same error message about "unable to find WU"... Hopefully (and given that the result headers are still in there and all), it's not really gone and this will all get sorted out...

Edit: Just checked, and this little glitch with the "unable to find WU", also shows up clicking on WUs are on my results page as having been granted credit already, as well... Hopefully (and not just from a credit, but also a science standpoint), all our results are all still in tact
30) Message boards : Number crunching : host 61123 looks suspiscious (Message 13628)
Posted 15 May 2006 by Nuadormrac
Post:
What I was thinking, is a possible editing of .xml files/cheating... Don't know for certain however what the actual cause is; but it's a bad host regardless
31) Message boards : Number crunching : host 61123 looks suspiscious (Message 13582)
Posted 12 May 2006 by Nuadormrac
Post:
Noticed this, when checking my own results... Anyhow, thus far he returned a credit claim that wildly conflicted the other cruncher who returned it so far

http://lhcathome.cern.ch/workunit.php?wuid=1348242

He claimed .02 credits, the other person claimed 30.75... So I clicked on his client, and saw this:

http://lhcathome.cern.ch/show_host_detail.php?hostid=61123

Average turnaround time 124227.13 days
Results 1096


How many years would that represent? Definitely more then 10 years... Umm, the project wasn't in place for 10+ years, a definite hack job...

Now for the juicy part in his results page:

http://lhcathome.cern.ch/results.php?hostid=61123

All of his credit claims are low, as in < 1 claimed credit as far back as I cared to look...

Edit: Looking further at this user's computer list

http://lhcathome.cern.ch/hosts_user.php?userid=751

he's got 24 computers, and a hell of a lot with a 0/day WU quota...
32) Message boards : Number crunching : Overclocking Failed (Message 13572)
Posted 11 May 2006 by Nuadormrac
Post:
Actually, by timing margin, I didn't mean the number of memory lookups in the sense of how many, as I meant a timing problem on the hardware side of things. AKA, if lack of timing margin in one's computer, results in a faster clock pulse (aka lets say one's running their memory bus at 210 MHz, and for 1 clock only it accesses it "runs at" 211 MHz, and then drops back down to 210 MHz... Or put it more directly, if the memory needs 15 ns before it will be ready to be accessed because one's running (either due to overclocking or not) right at the edge, but a particular clock cycle leaves it accessing it in 14 ns, then a problem can result. However, if the memory timing settings in BIOS effectively result in a 15 ns memory access time (typically, or on average), and the memory is capable of being accessed in 12 ns (as well as 14 ns, 15 ns, etc), then the 14 ns "faster clock pulse" won't effect stability any, due to it still having enough timing margin to accomodate that occassionally faster clock pulse... Hopefully, I didn't make things as clear as mud in the above :)

I do s'pose those extra memory ops in CPDN could be a reason why, as far as products go it's almost known as one of the biggest "stress test projects" for one's computer stability. It does almost seem that if something's going to be stable on CPDN, one probably won't have a problem elsewhere...
33) Message boards : Number crunching : LHC@home Alpha? (Message 13571)
Posted 11 May 2006 by Nuadormrac
Post:
I guess if after you get a new app out there, if you might end up needing anymore testers, I'm willing to help.

For now though, and as some have pointed out, there isn't reallly anything to test, and once there is another app, there's perhaps no knowing how many of the current members will "stick with it" with testing, or have since gone onto other projects...
34) Message boards : Number crunching : Overclocking Failed (Message 13527)
Posted 8 May 2006 by Nuadormrac
Post:
When the BIOS downclocks the FSB below the rated setting following someone having pushed an OC beyond what the hardware was capable, it doesn't necessarily mean the memory should be run at it's defaults. Many times, the BIOS sets it to some fail safe value following a crash from excessive OC, not an optimum. The fail safe can represent, a "gaurenteed", it will boot... From there, the user can end up having to go back in, or actually they find themself in there with the error message, and an opportunity to re-set the CMOS...

Travis is correct however, that if enough waite states get introduced, that the actual performance might not be improved...

Also, an OC doesn't guarentee that there will be errors, but the possibility can be greater. It isn't an all the time thing. What's more, at least in the past, Intel for one would sometimes underclock the proc upon shipping. By this, I mean that if Intel had too many Pentium 133s, and they didn't want to lower the price (aka supply and demand), they might have re-packaged some of the Pentium 133s as Pentium 100s (even though through internal testing they were rated for the higher clock). Doing this, they could in effect reduce the supply of Pentium 133s, to help keep costs higher...

Regardless however, and hopefully to de-mystify this somewhat, what can really go on is this. The timing crystal does not give a perfect/steady clock rate, but from one clock pulse to the next can very slightly. In fact, with an A64 (which already has CnQ to allow the clock to very under useage), this can be seen even when CnQ is disabled. Looking at my own A64, it was not uncommon to see an occasional clock that was either higher or lower then the rated clock, by exactly the multiplier (aka 1 MHz faster, or 1 MHz slower on the HTT).

Given this fact, that in the real world (vs. on paper), where things can very slightly, manufacturers will routinely put a little timing margin into their products. This is to assure, that even when one runs into the occassional "faster then typical clock pulse", all will still run as it should.

Because of this, there can also be room to over-clock, though one's results can very (aka the luck of the draw, as some have sometimes gotten a better OCing CPU, memory, etc, and some have been less lucky). Course there are no gaurentees here... As long as one maintains a degree of timing margin, while one's running under the most stressful conditions (which is why many test with Prime 95 for instance), one should be OK. And BTW, CPDN can be much more stressful on a CPU from many people's experience then LHC, or many other projects. In fact CPDN is where many users have been most prone to run into problems...

If however, one's run out of timing margin or doesn't have any, and they get that occassional faster clock pulse, then a timing problem can result, and yes an error. It's not that OCing gaurentees an error, but having a timing problem (aka running faster then the hardware is capable, even for that random instant) can result in one. Reason it can also be good to back off when one finds their hardware's limit, to make sure some margin is left in the system...

As to temps, this is true, but there are ways to deal with it also. People who are serious about OCing, aren't necessarily going to use one of those thermal pads, but might prefer something like Arctic Silver 5. The heat sinks, and attention to one's cooling one might give their system can also be greater...
35) Message boards : Number crunching : How long to wait before work is available? (Message 13465)
Posted 27 Apr 2006 by Nuadormrac
Post:
LOL, wakes someone up to give them a sleeping pill... It'd be funny if someone was either very grouchy upon waking up, or rolls over and while still unconscious (so they don't know they're doing it), bops the person in the nose :rofl

Oh, and it'd be debatable whether that would always work. I swear, when I was last in the hospital they gave me Vicadin thinking it would put me to sleep. It reduced the pain yes, but had no effect on my wakefuless whatsoever. I was up all night, and getting a bit tired of this 1 person down the hall.

Lets just say I was sent to the dialysis wing for a blood clot in the leg. What I didn't know, is that wing doubled up as over-flow for the psychiatric patients. Whenever someone walked by, the lady would scream and yell, while banging (or perhaps throwing her self up against) the walls. Don't know about the latter, as I only heard it, didn't see it.

When I left the next day (getting out a bit early, but only if I gave myself self injections till the blood thinner took effect), I over-heard a nurse tell an intern "be careful of that lady in the corner. She's psychotic, and there's blood all over her room" EEK

Anyhow, Vikadin, and then Perkacet (when I had it for an extracted wisdom tooth the other week), didn't effect my wakefulness at all :D
36) Message boards : Number crunching : "In progress" means ?? (Message 13464)
Posted 27 Apr 2006 by Nuadormrac
Post:
From the stantpoint of the project, a few days doesn't matter. As River and some others have pointed out, they aren't waiting for results... Case in point, no work is out there yet, and CERN is still not ready for the next batch of work.

They have other things they also need to do (as they're doing right now), and so will get to the WUs when they're ready. Wasn't necessarily a matter of crunchers holding them up...

As to the CPUs, it is true that Intel and AMD for instance have slightly different implementation, but this isn't based simply on the clock rate (actually performance, given the efficiency of 2 CPUs need not be the same). Both Intel and AMD have never gen processors, as well as older. And someone with a lattest gen dual proc P4 3.4+ GHz processor using the lattest core, and someone using an X2 or FX60! on the other hand, aren't exactly going to be hurting for crunch time...

The slower comps would be more a PII vs P4, or a K62 vs. A64 type situation, then an Intel vs. AMD. Though there might be slight differences performance wise, the high end processors from each company aren't entirely non-competitive against each other, or anything of the sort... And true, perhaps the PII might get a bit of a different result then a P4 (different architecture), but as long as one can compare Intel to AMD, slight differences will show up (if this is what's wanted, unlike with Predictor which ties a WU to a given processor type to allow less variation).
37) Message boards : Number crunching : How long to wait before work is available? (Message 13436)
Posted 25 Apr 2006 by Nuadormrac
Post:
I don't even get the point of this thread resurection here. If it wasn't resurected, then the thread would have remained "closed" by proxy. The proxy being that most peopple probably wouldn't even remember this thread was here, or care to look back.

Even though it was ressurected, I still don't feel inclined look back at something from 2004. The here and now is what matters to both our work queues as well as the respective projects themselves, and what people would post/respond to, one way or the other...
38) Message boards : Number crunching : I think we should restrict work units (Message 13419)
Posted 20 Apr 2006 by Nuadormrac
Post:
Now, on a semi-related note: I see several work units timing out after a quorum has been met. Will those units be re-sent or is the quorum already good enough?


They wouldn't have to be, so I wouldn't expect them to be. As is, the 4th unit is a work around for people who tras units, so a quorum could still be reached, and the unit could still be validated. In this way and to my understanding, if the 4th unit comes in it gets credit, but if it doesn't, that's it.

Unless, among the 3, a quorum can't be reached, because the results are too different and there is no ability for the validator to determine which return was "correct". In that case, another would be sent until a quorum of results could be reached.

As to the queue size, if there was a problem for the project scientists, I'm rather certain that either they would adjust deadlines as has been done from some accounts here, or they, or a moderator would come in mention the prob, and point blank ask people. This might even be announced on the front page. RALPH asks people not to have big queues, which is their need. They also state why "it's a testing environment and we want to test over a variety of different computer configs".

It's also like, when akosv's optimized app started showing up on einstein@home... There was no official word, and some assumed that perhaps Bruce Allen couldn't speak, because perhaps parts of the app were copyrighted or something, for which the code couldn't be open sourced...

Anyhow, in the end, not only was a beta started based on akosv's optimizations, but when akosv had made recommendations on how they could improve the beta, Bruce Allen doesn't ignore him... But in any case, it's sorta like over there. One person ended up saying back in the earlier days of this and before the announced beta "well it's not officially endorsed, but the project hasn't spoke against it either, and it is validating. Now, don't you think that if Bruce Allen came in here and asked people to stop using it, that each person here wouldn't delete it from their computers, post haste?"

If the project has a problem, be it with deadlines and the like, they wouldn't necessarily sit their gritting their teeth in abject silence. They'd more then likely express their concerns and give their request to the volunteers.
39) Message boards : Number crunching : "In progress" means ?? (Message 13418)
Posted 20 Apr 2006 by Nuadormrac
Post:
Yeah, 27 waiting to be returned...

However, because 4 WUs are sent out, but only 3 are needed for a quorum, that could include those which had validated already, I would gather...
40) Message boards : Number crunching : Not HAPPY people. (Message 13363)
Posted 14 Apr 2006 by Nuadormrac
Post:
BTW, for anyone curious; if a project is set to no new work, but has WUs, and the WUs themselves, rather then the projects are suspended, they will accumulate negative debt. Noticed this with seasonal attribution, as I had suspended the WU (not the projects) to seasonal attribution and CPDN while going through LHC WUs.

There's an obvious reason to this also. CPDN WUs are extremely long, and if one needs to suspend it, one doesn't want it downloading new work and then possibly getting over committed. CPDN has issues if it's allowed to run, while certain CPU or memory intensive apps are also running, so...

All said however, I'd rather not have it continue going out again and again when there is no work, but here is a way if anyone is curious on how the books could be balanced. Don't be late on the last WU if you try to make use of this though.


Previous 20 · Next 20


©2024 CERN