I think we should restrict work units

Author	Message
m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 1,864,470 RAC: 0	Message 13781 - Posted: 29 May 2006, 14:56:41 UTC - in response to Message 13777. Seriously though im guessing saying nasty things to others make u feel good so fire away im very thick skinned and i like making others happy. No offence intended. This was a gentle dig aimed at the 'credit-racers', of whom you claim you're not one. But judging by your sensitivity to it I'd guess that you are 'racing' more than you let on. You're not a very happy camper today, are you Mike? Three steamed up posts in one day. :-) SMILE, lifes short. Click here to join the #1 Aussie Alliance on LHC. ID: 13781 · Reply Quote

clownius Send message Joined: 1 May 06 Posts: 34 Credit: 64,492 RAC: 0	Message 13782 - Posted: 29 May 2006, 14:57:40 UTC - in response to Message 13780. Oh we have definitely a delay in completion, thanks to big-cache-junkies who mainly crunch for there personal benefit (how silly it even may) and the inadequacies of the boincsystem to counteract WU-crabbers. I agree the number of WU's still being crunched is rather silly. I can understand some get lost. For example after the last time i got WU's before this run BOINC died on me and i lost all my pending WU's. But many have been snatched up by those that dont crunch them fast and now have to reissue cause the cache was just to big. This aint cool.... Unfortunatly LHC and Einstein are the only "hard science" (physics) projekts in non-alpha/beta stage. And its needs the completion of the current work to issue the next one. And this is the reason i turn my computer 100% to it when it has work and crunch as much and as fast as i can. Many of those last WU's get horded by those caching up to get the last few units...at least im crunching em nice and quick. No offence intended. This was a gentle dig aimed at the 'credit-racers', of whom you claim you're not one. But judging by your sensitivity to it I'd guess that you are 'racing' more than you let on. Ok in this case i withdraw my comment it was uncalled for. But yours were not exactly required either. But just out of common curtosy next time you reply to anothers post try not to flame them if they havent flamed you. Its really not required and juvinile. Ill act my age and appoligise to you. Hope you do the same. If you took the time to take a look at my sig u will notice im not really crunching big numbers in any project and if thats racing im scared as its more a slow stroll lol. ID: 13782 · Reply Quote

Gaspode the UnDressed Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0	Message 13785 - Posted: 29 May 2006, 18:05:10 UTC My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: If the cap fits, wear it! Happy crunching! Gaspode the UnDressed http://www.littlevale.co.uk ID: 13785 · Reply Quote

The Gas Giant Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0	Message 13786 - Posted: 29 May 2006, 23:28:56 UTC Last modified: 29 May 2006, 23:29:55 UTC If it was such a big problem for the project, then surely they would reduce the deadline. Reduce it down to 3, 4 or 5 days and caches will be smaller by default. The great wu snap up might then only occur on the 4th or 5th day. Resulting in wu's not coming back until 10 to 11 days after the first ones were issued. So this would be no benefit for the project. Releasing all the wu's on day 1 with the same 7 day deadline means that they should be received within 8 to 9 days of them first being released - noting that the project cache takes 48hrs to run down. If it is such a big problem for the project (and it really doesn't appear to be) then the best way to release the wu's is in smaller lots of say 20,000 results (that's 4,000 wu's) per day. Oh hang on...the project would be worse off. It would take 6 days to release all the wu's and if they had a 5 day deadline that might mean the last ones aren't returned until 11 days after the first ones are released. Also remember that the project does not "care" about us volunteers we are purely a resource. If the wu's are getting completed to the requirements of the project then why should they care if some people do not get the number of wu's they want. This is only a problem for the volunteers because people's imaginations have been piqued by participating in a big engineering/physics project so therefore want to crunch this project as much as possible and wu's are limited. Also don't forget that Chrulle worked on the "best" way to ensure the wu's are returned the quickest overall. He appears to have hit on it. Live long and crunch (if you've got 'em). Paul (S@H1 8888) BOINC/SAH BETA ID: 13786 · Reply Quote

m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 1,864,470 RAC: 0	Message 13790 - Posted: 30 May 2006, 16:06:12 UTC - in response to Message 13785. My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this: If the cap fits, wear it! Happy crunching! I don't mind wearing it ;-) But I don't see the problem with people who crunch projects for the credits or any other reason, so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. And it keeps my computers off the street ;-) Click here to join the #1 Aussie Alliance on LHC. ID: 13790 · Reply Quote

m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 1,864,470 RAC: 0	Message 13792 - Posted: 30 May 2006, 16:28:05 UTC I think you've got the right idea there Gas Giant. If there was a problem, the project would have changed something by now. I don't know that the staff don't care about us, I think Chrulle said they are between grad students at the moment. It may take a while for things to show up but one of the main reasons I liked LHC so much was that project staff were involved with the message boards. So that's how you spell "piqued", I'll have to keep a copy of that ;-). Click here to join the #1 Aussie Alliance on LHC. ID: 13792 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 13804 - Posted: 1 Jun 2006, 3:41:51 UTC Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? ID: 13804 · Reply Quote

The Gas Giant Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0	Message 13805 - Posted: 1 Jun 2006, 4:21:07 UTC - in response to Message 13804. Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? I would have thought a replication of 4 would have been sufficient. ID: 13805 · Reply Quote

Gaspode the UnDressed Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0	Message 13806 - Posted: 1 Jun 2006, 5:21:43 UTC - in response to Message 13804. Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? The five/three ratio is to improve the chances of getting a quorum at the first attempt. It's down to SixTrack's extreme sensitivity to numerical accuracy. In aven the most solid computer there can be the occasional single bit error that will throw the result off. Sending five results should improve the chance of a reaching a quorum, and so reduce the completion time for the study. From what I see on the results pages, most results reach quorum at three, so a replication of five is redundant. I'd like to know if the fourth and fifth results are still issued if a quorum has already been reached. Gaspode the UnDressed http://www.littlevale.co.uk ID: 13806 · Reply Quote

Alex Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0	Message 13807 - Posted: 1 Jun 2006, 5:41:47 UTC - in response to Message 13805. I would have thought a replication of 4 would have been sufficient. It cuts down significantly on the number of times they have to send the work unit back out to be crunched. Unlike Seti or Climate prediction, this project has a higher number of results that don't verify against other results for various reasons. I'm not the LHC Alex. Just a number cruncher like everyone else here. ID: 13807 · Reply Quote

John Hunt Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0	Message 13808 - Posted: 1 Jun 2006, 6:29:01 UTC - in response to Message 13790. .............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. And it keeps my computers off the street ;-) Sums it up in a nutshell for me........ Well said, Mike! ID: 13808 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 13809 - Posted: 1 Jun 2006, 7:22:02 UTC - in response to Message 13806. Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers? The five/three ratio is to improve the chances of getting a quorum at the first attempt. It's down to SixTrack's extreme sensitivity to numerical accuracy. In aven the most solid computer there can be the occasional single bit error that will throw the result off. Sending five results should improve the chance of a reaching a quorum, and so reduce the completion time for the study. From what I see on the results pages, most results reach quorum at three, so a replication of five is redundant. I'd like to know if the fourth and fifth results are still issued if a quorum has already been reached. What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum? ID: 13809 · Reply Quote

Gaspode the UnDressed Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0	Message 13810 - Posted: 1 Jun 2006, 8:17:37 UTC - in response to Message 13809. What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum? Extremely small, I'd guess. Sixtrack suffers from the single-bit sensitivity because of the way it handles its numbers, and the fact that it does the operations repeatedly. A single bit error in the first iteration of an algorithm will generate a different erroneous result than the same error occuring at, say, iteration 500,000. Given that a single bit problem can creep in potentially anywhere (and anywhen), the chances of two different computers generating the same incorrect result are vanishingly small. The same can't be said of the same computer running the same unit twice, however. It is possible that some sort of systematic failure could generate consistent errors at consistent points in the algorithm. Such a computer would probably never generate a valid LHC result, although it might work perfectly well in every other regard. Gaspode the UnDressed http://www.littlevale.co.uk ID: 13810 · Reply Quote

m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 1,864,470 RAC: 0	Message 13813 - Posted: 1 Jun 2006, 15:08:03 UTC - in response to Message 13808. .............. so long as we have fun. I enjoy the whole thing. The team the credits, the science, making new friends and sharing ideas across so many boundaries and it helps someone. Perhaps many someones. And it keeps my computers off the street ;-) Sums it up in a nutshell for me........ Well said, Mike! I thought so too. Thankyou 8-) Click here to join the #1 Aussie Alliance on LHC. ID: 13813 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 13821 - Posted: 2 Jun 2006, 6:59:29 UTC - in response to Message 13810. What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum? Extremely small, I'd guess. Sixtrack suffers from the single-bit sensitivity because of the way it handles its numbers, and the fact that it does the operations repeatedly. A single bit error in the first iteration of an algorithm will generate a different erroneous result than the same error occuring at, say, iteration 500,000. Given that a single bit problem can creep in potentially anywhere (and anywhen), the chances of two different computers generating the same incorrect result are vanishingly small. The same can't be said of the same computer running the same unit twice, however. It is possible that some sort of systematic failure could generate consistent errors at consistent points in the algorithm. Such a computer would probably never generate a valid LHC result, although it might work perfectly well in every other regard. for what it is worth, I have error detecting and correcting memory on my machine. I wonder how typical that is anymore... One of the LHC discussions mentioned the development of libraries that were able to return consistent results on different machines. If those libraries are used, then it seems a quorum of 2 with replication of 3 would suffice. But, since the computer resource is "free" and folk often clamor for "more work," it probably leads to higher quorums and higher initial replications. Has there been any discussion of giving "bonus points" for work units that are finished "quickly". It would seem this would be useful when errors from the initial replication group necessitated the resending of workunits closer to the deadline... ID: 13821 · Reply Quote

m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 1,864,470 RAC: 0	Message 13828 - Posted: 2 Jun 2006, 10:34:58 UTC - in response to Message 13821. .... [snip] ..... for what it is worth, I have error detecting and correcting memory on my machine. I wonder how typical that is anymore... Common on all servers of all sizes. Click here to join the #1 Aussie Alliance on LHC. ID: 13828 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 13832 - Posted: 2 Jun 2006, 13:26:27 UTC - in response to Message 13828. .... [snip] ..... for what it is worth, I have error detecting and correcting memory on my machine. I wonder how typical that is anymore... Common on all servers of all sizes. sure, but I meant how common is it among the BOINC or LHC crunchers. ID: 13832 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 13862 - Posted: 3 Jun 2006, 18:28:48 UTC - in response to Message 13376. I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch. I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process. That doesn't help the project - that's greed by people who want the most LHC units. When the number of available units hits zero, the scientists shouldn't have to wait more than a day or two. I suggest that the project limit the number of work units per computer to 2-3 at any given time. That way, as soon as all the work is sent out LHC will get them all back very soon after. Once a work unit is sent back, that computer can have another. This will speed up work-unit generation for all of us (my cache is set very low and every work unit I get is sent back within 12 hours, since I have other projects running too) since LHC scientists will get their work back faster and thus be able to create the next batch sooner. Matt - I want to thank you for taking the time to post this and start this thread. Prior to your having done so, I was have difficulty getting work units to run for LHC. Thanks to your clear explanation, I raised my cach for .01 to 10 days. And yup, As soon as there was work to do, I was able to get a bunch of it to work on. Again, thanks for your help in showing us how to get the maximum number of work units to process. Phil ID: 13862 · Reply Quote

John Hunt Send message Joined: 13 Jul 05 Posts: 133 Credit: 162,641 RAC: 0	Message 13864 - Posted: 3 Jun 2006, 18:55:42 UTC - in response to Message 13862. I notice that this is slowed down by a minority of users who set their caches to maximum........ Thanks to your clear explanation, I raised my cach for .01 to 10 days. And yup, As soon as there was work to do, I was able to get a bunch of it to work on. Again, thanks for your help in showing us how to get the maximum number of work units to process. Phil I set my cache to 1 day right back from when I started BOINCing....... and I received half-a-dozen WUs on the last distribution of work.... ID: 13864 · Reply Quote

Bob Guy Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0	Message 13865 - Posted: 3 Jun 2006, 21:07:38 UTC - in response to Message 13821. for what it is worth, I have error detecting and correcting memory on my machine. I think the one-bit errors do not originate in the memory, the errors originate in the FPU/SSE. It is a known fault of the AMD cpus that the AMD FPU processes numbers differently (possibly less accurately) than the Intel FPU (this is usually overcome by proper program code). It is also a fact that overclocking can cause the FPU to be less accurate (the one-bit errors) for both AMD and Intel. One interesting and not well known feature of the FPU is that inside the FPU numbers are not represented as decimals as you might think. So, of course you say: they're binary! This is not true - the numbers are IEEE format for hardware design reasons. There are decimal numbers that can not be represented exactly in IEEE format. The numbers are 'close enough' for most purposes and special code is usually implemented to minimize error - the usual process is by extending precision and using careful rounding. At any rate, the errors introduced by IEEE format can be exagerrated by one-bit errors at or near the limits of precision. ID: 13865 · Reply Quote

LHC@home