Message boards :
Number crunching :
initial replication
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Jul 05 Posts: 10 Credit: 79,466 RAC: 0 |
The search engine didn't find anything, so I assume this hasn't been discussed lately. LHC uses an initial result replication of 5. IMO that's way too much, i.e. wasting a lot of CPU time. Einstein switched from 4 to 3 lately and seti is doing fine with 4. What I suggest: use replication 3 in the beginning of a study, when there's still lot's of work to do and it doesn't matter if some WUs will take longer than 1 or 2 weeks. If the studies are almost finished, say 50.000 jobs left, you can go to replication 4 or 5 to finish the jobs quickly so you can start the final analysis. I know LHC is doing fine with the CPU power they get, but all the time saved can possibly benefit other projects. Regards, MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 18 Sep 04 Posts: 18 Credit: 26,904 RAC: 0 |
I agree with you, we can more doing with a quorum of 3. |
Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0 |
Right, 3 should be enough. Is it possible to send the fourth WU only if there are no 3 valid results within 3 or 4 days back. Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] |
Send message Joined: 2 Sep 04 Posts: 545 Credit: 148,912 RAC: 0 |
If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ... |
Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0 |
If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ... Take a look at the send time, 5 times within 2 minutes. http://lhcathome.cern.ch/workunit.php?wuid=1034762 Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] |
Send message Joined: 18 Sep 04 Posts: 163 Credit: 1,682,370 RAC: 0 |
AFAIK issuing 5 results here at LHC had two reasons, although I can't point you to an official post, sorry. 1) It increases throughput. The quorum is reached sooner. And most of the time at least the 4th result will not be sent. It is marked "did not need". LHC needs throughput, because at the end of a large study some smaller will follow. They depend on the results from the earlier studies. So the sooner all results are returned and quorums formed, the sooner new WUs can be released. 2) During LHC beta in Sept. 2004 and later there were large numerical differences between platforms, so three results were almost always not enough. So sending more was the way to go. (This is no longer valid though.) Michael Team Linux Users Everywhere |
Send message Joined: 26 Jul 05 Posts: 10 Credit: 79,466 RAC: 0 |
@Michael: That certainly makes sense. Why I'm asking is that in the beginning of a >500.000 WU job, the quick response is not needed and a lower quorum would increase the throughput (considerably). At the end of that job and for smaller ones the replication 5 is fine to get the latency of the results down. I'd guess that this is just one parameter for the server which can be adjusted easily, maybe even automatically when they generate the jobs. @Paul: The way I understand it: initial replication means that they aim to issue 5 results. If there are not enough hosts demanding work, this may not happen, but normally it looks like this random one of mine: http://lhcathome.cern.ch/workunit.php?wuid=988541 Like Desti already said, all 5 are sent out within 2 minutes. The quorum of 3 means that if they get 3 identical results and 2 differing ones, they will take the 3-result and won't send the WU out to other hosts. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 18 Sep 04 Posts: 163 Credit: 1,682,370 RAC: 0 |
@Michael: This might be a WU attribute. Someone with BOINC server side knowledge might know for sure, I don't. As to automatically adjusted parameters; to the end of a study deadlines are shortened significantly, less than a week, to increase throughput. Forgot to mention it in my earlier post, sorry. Michael Team Linux Users Everywhere |
Send message Joined: 26 Jul 05 Posts: 10 Credit: 79,466 RAC: 0 |
If they are already adjusting parameters depending on the status of a study, then this (normal replication 3 or 4, later 5) should not be too hard to implement. MrS Edit & OT: finally added my avatar ... oh I love this picture :D Scanning for our furry friends since Jan 2002 |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Edit & OT: finally added my avatar ... oh I love this picture :D yep - first time in 2006 a new avatar has got me smiling :) |
Send message Joined: 18 Sep 04 Posts: 163 Credit: 1,682,370 RAC: 0 |
If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ... Here is an example. Michael Team Linux Users Everywhere |
Send message Joined: 26 Jul 05 Posts: 10 Credit: 79,466 RAC: 0 |
That's the way I like to see it. Now the question is how often that happens compared to our examples where all 5 were sent out immediately. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,534,200 RAC: 15,570 |
|
Send message Joined: 18 Sep 04 Posts: 163 Credit: 1,682,370 RAC: 0 |
How about this one? Don't worry, they are sent eventually. Michael PS: As of now one result pending, four unsent. Team Linux Users Everywhere |
Send message Joined: 2 Sep 04 Posts: 545 Credit: 148,912 RAC: 0 |
LHC@Home issues work in a slightly different manner than the other projects. This leads to some of the "odd" effects seen. The intent is to send out the first of the unsent set of all work, then the second in "stripes" across the set of work to be issued. Having the 4th and 5th result pre-created means that it can be issued very quickly without waiting for the expiration of the deadlines before a new work unit is created. |
Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0 |
How about this one? Now it's send out FIVE times, like a lot of others too. Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] |
Send message Joined: 29 Sep 05 Posts: 10 Credit: 11,373 RAC: 0 |
AFAIK issuing 5 results here at LHC had two reasons, Imagine, you manage dogsledge post in Arctica and your aim is making big throuput, so for each dog team you choose 2 quickly dogs and for speeding them up you add them 3 slowly ones ;) as probably LHC does now. Artur |
©2024 CERN