Message boards :
Number crunching :
I think we should restrict work units
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author | Message |
---|---|
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: If the cap fits, wear it! Happy crunching! Gaspode the UnDressed http://www.littlevale.co.uk |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
If it was such a big problem for the project, then surely they would reduce the deadline. Reduce it down to 3, 4 or 5 days and caches will be smaller by default. The great wu snap up might then only occur on the 4th or 5th day. Resulting in wu's not coming back until 10 to 11 days after the first ones were issued. So this would be no benefit for the project. Releasing all the wu's on day 1 with the same 7 day deadline means that they should be received within 8 to 9 days of them first being released - noting that the project cache takes 48hrs to run down. If it is such a big problem for the project (and it really doesn't appear to be) then the best way to release the wu's is in smaller lots of say 20,000 results (that's 4,000 wu's) per day. Oh hang on...the project would be worse off. It would take 6 days to release all the wu's and if they had a 5 day deadline that might mean the last ones aren't returned until 11 days after the first ones are released. Also remember that the project does not "care" about us volunteers we are purely a resource. If the wu's are getting completed to the requirements of the project then why should they care if some people do not get the number of wu's they want. This is only a problem for the volunteers because people's imaginations have been piqued by participating in a big engineering/physics project so therefore want to crunch this project as much as possible and wu's are limited. Also don't forget that Chrulle worked on the "best" way to ensure the wu's are returned the quickest overall. He appears to have hit on it. Live long and crunch (if you've got 'em). Paul (S@H1 8888) ![]() ![]() |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,138,133 RAC: 441 ![]() ![]() |
My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: I don't mind wearing it ;-) But I don't see the problem with people who crunch projects for the credits or any other reason, so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. And it keeps my computers off the street ;-) Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,138,133 RAC: 441 ![]() ![]() |
I think you've got the right idea there Gas Giant. If there was a problem, the project would have changed something by now. I don't know that the staff don't care about us, I think Chrulle said they are between grad students at the moment. It may take a while for things to show up but one of the main reasons I liked LHC so much was that project staff were involved with the message boards. So that's how you spell "piqued", I'll have to keep a copy of that ;-). Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? |
![]() Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? I would have thought a replication of 4 would have been sufficient. |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? The five/three ratio is to improve the chances of getting a quorum at the first attempt. It's down to SixTrack's extreme sensitivity to numerical accuracy. In aven the most solid computer there can be the occasional single bit error that will throw the result off. Sending five results should improve the chance of a reaching a quorum, and so reduce the completion time for the study. From what I see on the results pages, most results reach quorum at three, so a replication of five is redundant. I'd like to know if the fourth and fifth results are still issued if a quorum has already been reached. Gaspode the UnDressed http://www.littlevale.co.uk |
![]() Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
It cuts down significantly on the number of times they have to send the work unit back out to be crunched. Unlike Seti or Climate prediction, this project has a higher number of results that don't verify against other results for various reasons. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
![]() Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0 |
.............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. Sums it up in a nutshell for me........ ![]() Well said, Mike! ![]() |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum? |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Extremely small, I'd guess. Sixtrack suffers from the single-bit sensitivity because of the way it handles its numbers, and the fact that it does the operations repeatedly. A single bit error in the first iteration of an algorithm will generate a different erroneous result than the same error occuring at, say, iteration 500,000. Given that a single bit problem can creep in potentially anywhere (and anywhen), the chances of two different computers generating the same incorrect result are vanishingly small. The same can't be said of the same computer running the same unit twice, however. It is possible that some sort of systematic failure could generate consistent errors at consistent points in the algorithm. Such a computer would probably never generate a valid LHC result, although it might work perfectly well in every other regard. Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,138,133 RAC: 441 ![]() ![]() |
.............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. I thought so too. Thankyou 8-) Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
for what it is worth, I have error detecting and correcting memory on my machine. I wonder how typical that is anymore... One of the LHC discussions mentioned the development of libraries that were able to return consistent results on different machines. If those libraries are used, then it seems a quorum of 2 with replication of 3 would suffice. But, since the computer resource is "free" and folk often clamor for "more work," it probably leads to higher quorums and higher initial replications. Has there been any discussion of giving "bonus points" for work units that are finished "quickly". It would seem this would be useful when errors from the initial replication group necessitated the resending of workunits closer to the deadline... |
Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,138,133 RAC: 441 ![]() ![]() |
.... [snip] ..... Common on all servers of all sizes. Click here to join the #1 Aussie Alliance on LHC. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
.... [snip] ..... sure, but I meant how common is it among the BOINC or LHC crunchers. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch. Matt - I want to thank you for taking the time to post this and start this thread. Prior to your having done so, I was have difficulty getting work units to run for LHC. Thanks to your clear explanation, I raised my cach for .01 to 10 days. And yup, As soon as there was work to do, I was able to get a bunch of it to work on. Again, thanks for your help in showing us how to get the maximum number of work units to process. Phil |
![]() Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0 |
I set my cache to 1 day right back from when I started BOINCing....... and I received half-a-dozen WUs on the last distribution of work.... ![]() |
Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0 |
for what it is worth, I have error detecting and correcting memory on my machine. I think the one-bit errors do not originate in the memory, the errors originate in the FPU/SSE. It is a known fault of the AMD cpus that the AMD FPU processes numbers differently (possibly less accurately) than the Intel FPU (this is usually overcome by proper program code). It is also a fact that overclocking can cause the FPU to be less accurate (the one-bit errors) for both AMD and Intel. One interesting and not well known feature of the FPU is that inside the FPU numbers are not represented as decimals as you might think. So, of course you say: they're binary! This is not true - the numbers are IEEE format for hardware design reasons. There are decimal numbers that can not be represented exactly in IEEE format. The numbers are 'close enough' for most purposes and special code is usually implemented to minimize error - the usual process is by extending precision and using careful rounding. At any rate, the errors introduced by IEEE format can be exagerrated by one-bit errors at or near the limits of precision. |
![]() Send message Joined: 19 May 06 Posts: 20 Credit: 297,111 RAC: 0 ![]() ![]() |
I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process. I'm sure MattDavis or someone will correct me if I'm wrong, but I thought the original post, quoted in part here, was saying that you *shouldn't* max out your cache. Doing that means you get a lot of work, true. But it also means that the work gets done slower because you're sitting on work that other people with a lower cache (getting work as they complete it) could be doing. Leaving some computers dry is not the best way to get work done promptly. It slows down the process and makes everyone wait longer to get more work. Wasn't that the whole point behind the original post and subject of limiting work units? To make sure that everyone gets a fair share, not to have some people hogging work for themselves while others' computers get left dry? |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process. hmm - You mean that there may have been unintended consequences from starting this thread? Even so, I'm thankful for the idea. |
©2025 CERN