Thread 'Initial Replication'

Author	Message
[AF>Futura Sciences>Linux] Thr... Send message Joined: 6 Mar 07 Posts: 8 Credit: 31,454 RAC: 0	Message 17645 - Posted: 31 Jul 2007, 14:49:04 UTC - in response to Message 17639. Last modified: 31 Jul 2007, 14:50:25 UTC As an example I give you a wu from Einstein Einstein unit basically the wu was initially issued on 8th June but did not reach quorum until 27th July !! enought said? The project admins/scientist are the only people who have the whole picture, we who contribute our resourses can only trust that they are choosing the correct parameters. Thats not to say we cannot suggest what we think is better. Einstein can afford this kind of thing occasionally, LHC can't. That's wrong. The quorum is reached the 16th when the third host send his result. So it tooks 9 days to complete the quorum. For LHC, IR=5 doesn't create only wasting calculations. If there is 20-25% of error computing, there is one question to ask : are those errors retorted on all the WU or localised on some "hard to compute" WU ? If only on some hard to compute WU, the calculation of Alex is the good point of view, because you can quickly obtain a lot of compute error on one WU. If IR was 3, and you obtain 2 errors, you'll need to resend 2, you got the 5 replications and got either a lot of delay. Can LHC afford it ? If not, that's a question of answer speed. As the project administrator said, we get job when scientists need it. Implicitely, he's saying that a short delay is needed to send back the WU. That's also why they can't / don't want to send work 24/7. Sure there is a waste time computing. But it seems that it is the cost to pay for this project. ------ Thrr-Gilag Kee'rr L'Alliance Francophone ID: 17645 · Reply Quote

John McLeod VII Send message Joined: 2 Sep 04 Posts: 165 Credit: 146,925 RAC: 0	Message 17646 - Posted: 31 Jul 2007, 14:58:19 UTC - in response to Message 17640. Einstein can afford this kind of thing occasionally, LHC can't. Thanks for the opinion but I'll wait for LHC to confirm that statement before I believe it. LHC has people waiting on the results to get physical work done (aligning the magnets). Einstein does not. Since LHC has real world deadlines, LHC has a problem with maximum turn around time for the WUs, therefore a higher initial replication is required. Einstein and S@H on the other hand are trying to get through a huge pile of work, and that leads to an initial replication to match the minimum quorum. This is documented in earlier threads if you care to go looking (try a couple of years ago). BOINC WIKI ID: 17646 · Reply Quote

DarkWaterSong Send message Joined: 5 Aug 05 Posts: 9 Credit: 4,207,037 RAC: 0	Message 17649 - Posted: 31 Jul 2007, 17:07:55 UTC If you feel this project is such a waste of your time, there are other projects that you apparently don't have issues with. Why not just stick with those. That is normally what a person does if they don't like something, they move to something else. I'm looking at the tasks this way: There have been multiple migration issues and the configs need to be tested. So if I am just running results to help with testing the hardware, I am fine with it. The high number of replicated tasks could easily be to check for patterns of errors. ID: 17649 · Reply Quote

larry1186 Send message Joined: 4 Oct 06 Posts: 38 Credit: 24,908 RAC: 0	Message 17652 - Posted: 31 Jul 2007, 20:41:37 UTC - in response to Message 17647. ...why make other work wait while you waste time crunching results that don't need to be crunched? Even with the initial replication of 5, I don't see ANY work waiting on the servers here since it all gets distributed so fast, or are you referring to the work on your host? Sure somebody's host will be taking time away from some other project. Where are you trying to optimize useful results : work done? Considering only LHC, and all of it's odd work availability, it's OK to have a higher replication to reach quorum faster since it is unknown how fast certain hosts will be, and there is FAR more hosts available than there is work. Taking a step back and looking at all projects, LHC could be seen as (not in my opinion) a "resource hog" but that is what LHC admins/scientists have decided what they need in order to complete their task, which is to build a damn sweet machine. Taking a step in and looking at a single host, a result that is not needed for quorum would be wasted effort and redundant. Which is exactly why the new feature that Scarecrow mentioned came to fruition. Draw your line and take your stance. ID: 17652 · Reply Quote

Bob Guy Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0	Message 17655 - Posted: 1 Aug 2007, 1:25:33 UTC Last modified: 1 Aug 2007, 1:32:08 UTC No one yet has mentioned an interesting coincidental phenomenon (or I've missed it). The WUs will meet quorum (of 3) faster on average if the initial replication is 5. There is a much higher probability that 3 fast machines will get a particular WU (and complete it) when 5 rIDs are sent out. Had the initial replication been 3, the probability is greater that one or two or all three of the computers would be slow ones thereby delaying the completion of quorum. My conclusion is that the quorum is completed faster with 5 WUs than with 3 WUs. It would be an improvement (in efficiency) if the server software could abort WUs whose results became redundant that hadn't already started on some of those slow computers. I'm also sure the owners of those computers might be distressed that some of their WUs were aborted and they lost the credit - this only being a problem with so little work available. The 5.10.x Boinc client can do this (aborting WUs) but it also requires a server upgrade, and I'm not sure all the bugs are out of the 5.10.xx client. ID: 17655 · Reply Quote

Alex Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0	Message 17656 - Posted: 1 Aug 2007, 1:57:32 UTC - in response to Message 17655. No one yet has mentioned an interesting coincidental phenomenon (or I've missed it). The WUs will meet quorum (of 3) faster on average if the initial replication is 5. There is a much higher probability that 3 fast machines will get a particular WU (and complete it) when 5 rIDs are sent out. Had the initial replication been 3, the probability is greater that one or two or all three of the computers would be slow ones thereby delaying the completion of quorum. My conclusion is that the quorum is completed faster with 5 WUs than with 3 WUs. It would be an improvement (in efficiency) if the server software could abort WUs whose results became redundant that hadn't already started on some of those slow computers. I'm also sure the owners of those computers might be distressed that some of their WUs were aborted and they lost the credit - this only being a problem with so little work available. The 5.10.x Boinc client can do this (aborting WUs) but it also requires a server upgrade, and I'm not sure all the bugs are out of the 5.10.xx client. Another feature of the IR of 5 is that for the cases where the Intel results don't match the AMD results, there's a higher probability that the initial group of 5 will have 3 of one pc and 2 of the other PC type. If you had an IR of 4, and two Intel PC's giving different results from two AMD pc's you get a condition where you cannot validate the WU until you send out more units. I'm not the LHC Alex. Just a number cruncher like everyone else here. ID: 17656 · Reply Quote

zombie67 [MM] Send message Joined: 24 Nov 06 Posts: 76 Credit: 10,211,769 RAC: 0	Message 17659 - Posted: 1 Aug 2007, 3:08:05 UTC - in response to Message 17593. Last modified: 1 Aug 2007, 3:11:20 UTC There's a mathematical reason that there's 5 sent out instead of 4. 20-25% of the crunching instances have errors. (I looked at the most recent work units that MY pc crunched, and there were 5 out of 27 client errors reported by all pc's crunching) I'm going to need to challenge those numbers. I looked at the 225 results I returned in july. There are still a handful still pending, but all the rest validated. 0 Errors. Also, I looked at my last 20 WUs, and there were only 2 Errors out of the 100 results initially issued. Now, I do not doubt the numbers you quoted were anything but 100% accurate. But I do not think that is a normal error rate. If the error rate was that bad, then there is a problem with the application, the WUs, or both. Dublin, California Team: SETI.USA ID: 17659 · Reply Quote

Bob Guy Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0	Message 17661 - Posted: 1 Aug 2007, 3:35:09 UTC I think there is some concern that the AMD results don't match or might not match the Intel results. That's still a valid reason to get as many WUs run that have both AMD and Intel results represented for a single WU. ID: 17661 · Reply Quote

Alex Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0	Message 17662 - Posted: 1 Aug 2007, 4:05:43 UTC - in response to Message 17659. I'm going to need to challenge those numbers. I looked at the 225 results I returned in july. There are still a handful still pending, but all the rest validated. 0 Errors. Also, I looked at my last 20 WUs, and there were only 2 Errors out of the 100 results initially issued. Looking at a single CPU isn't representative of the whole population. My last 20 wu's got credit as well. Now, I do not doubt the numbers you quoted were anything but 100% accurate. But I do not think that is a normal error rate. If the error rate was that bad, then there is a problem with the application, the WUs, or both. When I looked at the most recent WU's my pc crunched, I saw errors. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1692507 - no errors. 5 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1584631 - 3 no replys. 8 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1601064 - 1 no reply. 5 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1594030 - 1 client error. 6 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1592979 - 1 client error/compute error. 6 crunchers. 6 errors, 30 crunching instances. So, 20% with that sample. I'm not the LHC Alex. Just a number cruncher like everyone else here. ID: 17662 · Reply Quote

zombie67 [MM] Send message Joined: 24 Nov 06 Posts: 76 Credit: 10,211,769 RAC: 0	Message 17665 - Posted: 1 Aug 2007, 5:59:07 UTC - in response to Message 17662. Last modified: 1 Aug 2007, 6:06:05 UTC I'm going to need to challenge those numbers. I looked at the 225 results I returned in july. There are still a handful still pending, but all the rest validated. 0 Errors. Also, I looked at my last 20 WUs, and there were only 2 Errors out of the 100 results initially issued. Looking at a single CPU isn't representative of the whole population. My last 20 wu's got credit as well. I don't get your point. If I was not clear, those 225 results were across 12 different machines, two of which are AMD, and the rest various flavors of Pentium D, C2D, and Xeon (both netburst and C2D based) When I looked at the most recent WU's my pc crunched, I saw errors. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1692507 - no errors. 5 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1584631 - 3 no replys. 8 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1601064 - 1 no reply. 5 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1594030 - 1 client error. 6 crunchers. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1592979 - 1 client error/compute error. 6 crunchers. 6 errors, 30 crunching instances. So, 20% with that sample. Like I said, I don't doubt your numbers. I am saying that I doubt that is normal. Dublin, California Team: SETI.USA ID: 17665 · Reply Quote

Bob Guy Send message Joined: 28 Sep 05 Posts: 21 Credit: 11,715 RAC: 0	Message 17667 - Posted: 1 Aug 2007, 7:19:03 UTC - in response to Message 17666. Last modified: 1 Aug 2007, 7:23:17 UTC Your observation is absolutely true. Not efficient? Probably true. Why we do it the way we're doing it? Probably easier to implement. Are you saying that it's normal for you to doubt? (A deliberate misconstruction by me. I'm just teasing you!) ID: 17667 · Reply Quote

[AF>Futura Sciences>Linux] Thr... Send message Joined: 6 Mar 07 Posts: 8 Credit: 31,454 RAC: 0	Message 17668 - Posted: 1 Aug 2007, 7:59:05 UTC LHC have real work to do, and deadline to respect. Sure you can find more efficient ways to do this work. But how many times would they lost by developing and testing such new application ? Now there is CPU wasting time. But I thinck it's cheaper to loss CPU time that to spend reflexion time of scientific who have to afford strict deadlines. Don't you thinck so ? ------ Thrr-Gilag Kee'rr L'Alliance Francophone ID: 17668 · Reply Quote

Mike Dunn Send message Joined: 15 Jul 05 Posts: 9 Credit: 770,253 RAC: 0	Message 17672 - Posted: 1 Aug 2007, 12:41:20 UTC - in response to Message 17657. Last modified: 1 Aug 2007, 12:51:24 UTC I am assuming that one day in the not too distant future there will be a steady supply of LHC work or that the runs will become so big (or hosts so few) they won't be gobbled up in a few hours. I assume the short runs we see now are just test runs. Really ? You mean, we're NOT helping them work out alignments for all the hardware they are building ? That this is just practice for something else ? What is this 'something else', oh Great One, that you imply ? Get real. Read about the aims of this project - or don't you consider looking up historical questions & answers important ? They need the data NOW because they are modifying the collider systems NOW. Not in a few months or years - NOW. They make available bunches of WUs to deal with questions/scenarios that they need data on to make adjustments BEFORE they fire the collider up - I'd much rather they did this than fire it up & it either fails totally, or blows a massive hole in the ground ! If this means they send out 5 WUs instead of 3 or 4 - fine by me. The scientists get their data back quicker, which is what they need. Remember - this project is NOT for us; we just help out here. Will we get WUs once the project is finished ? I doubt it - why would they need it, once the system is up & running ? Will we get WUs for a different project ? The hints are that we may. Will it be a lot of WUs ? Again, I doubt it. So - if you don't like the way things are run, why not create your own project ? No-one is forcing you to stay here, after all. {Edit} To save you the hassle of actually looking for information, have a read of this : In March 2005, huge superconducting dipole magnets began to be installed in the LHC tunnel. Every time a new magnet is installed, measurements are made of its properties. If it deviates significantly from the specified values, SixTrack will be required to study what impact, if any, this difference might have on the operations of the machine. Getting the results as soon as possible makes a big difference for the engineers installing the thousands magnets (1232 dipole magnets alone). A lot of number-crunching will be needed over the coming years to do this critical analysis. So your participation in LHC@home really does help to build the LHC! Source is within this website, in the About section ID: 17672 · Reply Quote

Betting Slip Send message Joined: 17 Sep 04 Posts: 41 Credit: 27,497 RAC: 0	Message 17673 - Posted: 1 Aug 2007, 13:23:44 UTC - in response to Message 17672. Last modified: 1 Aug 2007, 13:32:12 UTC I am assuming that one day in the not too distant future there will be a steady supply of LHC work or that the runs will become so big (or hosts so few) they won't be gobbled up in a few hours. I assume the short runs we see now are just test runs. Really ? You mean, we're NOT helping them work out alignments for all the hardware they are building ? That this is just practice for something else ? What is this 'something else', oh Great One, that you imply ? Get real. Read about the aims of this project - or don't you consider looking up historical questions & answers important ? They need the data NOW because they are modifying the collider systems NOW. Not in a few months or years - NOW. They make available bunches of WUs to deal with questions/scenarios that they need data on to make adjustments BEFORE they fire the collider up - I'd much rather they did this than fire it up & it either fails totally, or blows a massive hole in the ground ! If this means they send out 5 WUs instead of 3 or 4 - fine by me. The scientists get their data back quicker, which is what they need. Remember - this project is NOT for us; we just help out here. Will we get WUs once the project is finished ? I doubt it - why would they need it, once the system is up & running ? Will we get WUs for a different project ? The hints are that we may. Will it be a lot of WUs ? Again, I doubt it. So - if you don't like the way things are run, why not create your own project ? No-one is forcing you to stay here, after all. {Edit} To save you the hassle of actually looking for information, have a read of this : In March 2005, huge superconducting dipole magnets began to be installed in the LHC tunnel. Every time a new magnet is installed, measurements are made of its properties. If it deviates significantly from the specified values, SixTrack will be required to study what impact, if any, this difference might have on the operations of the machine. Getting the results as soon as possible makes a big difference for the engineers installing the thousands magnets (1232 dipole magnets alone). A lot of number-crunching will be needed over the coming years to do this critical analysis. So your participation in LHC@home really does help to build the LHC! Source is within this website, in the About section I can't understand why you bothered to put fingers to keyboard to post that completely unhelpful diatribe. This thread is not about whether this work needs doing or not, it's about the waste of FINITE donated resources! Dagorath has posted some very compelling arguements in this thread and you respond with this. ID: 17673 · Reply Quote

AstralWalker Send message Joined: 30 Nov 05 Posts: 14 Credit: 1,912,890 RAC: 180	Message 17676 - Posted: 1 Aug 2007, 18:40:07 UTC - in response to Message 17657. I am assuming that one day in the not too distant future there will be a steady supply of LHC work or that the runs will become so big (or hosts so few) they won't be gobbled up in a few hours. While I haven't been with the project that long I don't remember this being true in the past and I wouldn't hold my breath that it will ever be true. ID: 17676 · Reply Quote