Message boards :
Number crunching :
I think we should restrict work units
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author | Message |
---|---|
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,068,660 RAC: 54 |
Seriously though im guessing saying nasty things to others make u feel good so fire away im very thick skinned and i like making others happy. You're not a very happy camper today, are you Mike? Three steamed up posts in one day. :-) SMILE, lifes short. Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 1 May 06 Posts: 34 Credit: 64,492 RAC: 0 |
Oh we have definitely a delay in completion, thanks to big-cache-junkies who mainly crunch for there personal benefit (how silly it even may) and the inadequacies of the boincsystem to counteract WU-crabbers. I agree the number of WU's still being crunched is rather silly. I can understand some get lost. For example after the last time i got WU's before this run BOINC died on me and i lost all my pending WU's. But many have been snatched up by those that dont crunch them fast and now have to reissue cause the cache was just to big. This aint cool.... Unfortunatly LHC and Einstein are the only "hard science" (physics) projekts in non-alpha/beta stage. And its needs the completion of the current work to issue the next one. And this is the reason i turn my computer 100% to it when it has work and crunch as much and as fast as i can. Many of those last WU's get horded by those caching up to get the last few units...at least im crunching em nice and quick. No offence intended. This was a gentle dig aimed at the 'credit-racers', of whom you claim you're not one. But judging by your sensitivity to it I'd guess that you are 'racing' more than you let on. Ok in this case i withdraw my comment it was uncalled for. But yours were not exactly required either. But just out of common curtosy next time you reply to anothers post try not to flame them if they havent flamed you. Its really not required and juvinile. Ill act my age and appoligise to you. Hope you do the same. If you took the time to take a look at my sig u will notice im not really crunching big numbers in any project and if thats racing im scared as its more a slow stroll lol. |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: If the cap fits, wear it! Happy crunching! Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
If it was such a big problem for the project, then surely they would reduce the deadline. Reduce it down to 3, 4 or 5 days and caches will be smaller by default. The great wu snap up might then only occur on the 4th or 5th day. Resulting in wu's not coming back until 10 to 11 days after the first ones were issued. So this would be no benefit for the project. Releasing all the wu's on day 1 with the same 7 day deadline means that they should be received within 8 to 9 days of them first being released - noting that the project cache takes 48hrs to run down. If it is such a big problem for the project (and it really doesn't appear to be) then the best way to release the wu's is in smaller lots of say 20,000 results (that's 4,000 wu's) per day. Oh hang on...the project would be worse off. It would take 6 days to release all the wu's and if they had a 5 day deadline that might mean the last ones aren't returned until 11 days after the first ones are released. Also remember that the project does not "care" about us volunteers we are purely a resource. If the wu's are getting completed to the requirements of the project then why should they care if some people do not get the number of wu's they want. This is only a problem for the volunteers because people's imaginations have been piqued by participating in a big engineering/physics project so therefore want to crunch this project as much as possible and wu's are limited. Also don't forget that Chrulle worked on the "best" way to ensure the wu's are returned the quickest overall. He appears to have hit on it. Live long and crunch (if you've got 'em). Paul (S@H1 8888) BOINC/SAH BETA |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,068,660 RAC: 54 |
My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: I don't mind wearing it ;-) But I don't see the problem with people who crunch projects for the credits or any other reason, so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. And it keeps my computers off the street ;-) Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,068,660 RAC: 54 |
I think you've got the right idea there Gas Giant. If there was a problem, the project would have changed something by now. I don't know that the staff don't care about us, I think Chrulle said they are between grad students at the moment. It may take a while for things to show up but one of the main reasons I liked LHC so much was that project staff were involved with the message boards. So that's how you spell "piqued", I'll have to keep a copy of that ;-). Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? |
Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? I would have thought a replication of 4 would have been sufficient. |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? The five/three ratio is to improve the chances of getting a quorum at the first attempt. It's down to SixTrack's extreme sensitivity to numerical accuracy. In aven the most solid computer there can be the occasional single bit error that will throw the result off. Sending five results should improve the chance of a reaching a quorum, and so reduce the completion time for the study. From what I see on the results pages, most results reach quorum at three, so a replication of five is redundant. I'd like to know if the fourth and fifth results are still issued if a quorum has already been reached. Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
It cuts down significantly on the number of times they have to send the work unit back out to be crunched. Unlike Seti or Climate prediction, this project has a higher number of results that don't verify against other results for various reasons. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0 |
.............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. Sums it up in a nutshell for me........ Well said, Mike! |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum? |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Extremely small, I'd guess. Sixtrack suffers from the single-bit sensitivity because of the way it handles its numbers, and the fact that it does the operations repeatedly. A single bit error in the first iteration of an algorithm will generate a different erroneous result than the same error occuring at, say, iteration 500,000. Given that a single bit problem can creep in potentially anywhere (and anywhen), the chances of two different computers generating the same incorrect result are vanishingly small. The same can't be said of the same computer running the same unit twice, however. It is possible that some sort of systematic failure could generate consistent errors at consistent points in the algorithm. Such a computer would probably never generate a valid LHC result, although it might work perfectly well in every other regard. Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,068,660 RAC: 54 |
.............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. I thought so too. Thankyou 8-) Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
for what it is worth, I have error detecting and correcting memory on my machine. I wonder how typical that is anymore... One of the LHC discussions mentioned the development of libraries that were able to return consistent results on different machines. If those libraries are used, then it seems a quorum of 2 with replication of 3 would suffice. But, since the computer resource is "free" and folk often clamor for "more work," it probably leads to higher quorums and higher initial replications. Has there been any discussion of giving "bonus points" for work units that are finished "quickly". It would seem this would be useful when errors from the initial replication group necessitated the resending of workunits closer to the deadline... |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,068,660 RAC: 54 |
.... [snip] ..... Common on all servers of all sizes. Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
.... [snip] ..... sure, but I meant how common is it among the BOINC or LHC crunchers. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch. Matt - I want to thank you for taking the time to post this and start this thread. Prior to your having done so, I was have difficulty getting work units to run for LHC. Thanks to your clear explanation, I raised my cach for .01 to 10 days. And yup, As soon as there was work to do, I was able to get a bunch of it to work on. Again, thanks for your help in showing us how to get the maximum number of work units to process. Phil |
Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0 |
I set my cache to 1 day right back from when I started BOINCing....... and I received half-a-dozen WUs on the last distribution of work.... |
Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0 |
for what it is worth, I have error detecting and correcting memory on my machine. I think the one-bit errors do not originate in the memory, the errors originate in the FPU/SSE. It is a known fault of the AMD cpus that the AMD FPU processes numbers differently (possibly less accurately) than the Intel FPU (this is usually overcome by proper program code). It is also a fact that overclocking can cause the FPU to be less accurate (the one-bit errors) for both AMD and Intel. One interesting and not well known feature of the FPU is that inside the FPU numbers are not represented as decimals as you might think. So, of course you say: they're binary! This is not true - the numbers are IEEE format for hardware design reasons. There are decimal numbers that can not be represented exactly in IEEE format. The numbers are 'close enough' for most purposes and special code is usually implemented to minimize error - the usual process is by extending precision and using careful rounding. At any rate, the errors introduced by IEEE format can be exagerrated by one-bit errors at or near the limits of precision. |
©2025 CERN