Message boards : Number crunching : initial replication
Message board moderation

To post messages, you must log in.

AuthorMessage
ExtraTerrestrial Apes
Avatar

Send message
Joined: 26 Jul 05
Posts: 10
Credit: 79,466
RAC: 0
Message 12021 - Posted: 15 Jan 2006, 12:52:50 UTC
Last modified: 15 Jan 2006, 12:53:54 UTC

The search engine didn't find anything, so I assume this hasn't been discussed lately.

LHC uses an initial result replication of 5. IMO that's way too much, i.e. wasting a lot of CPU time. Einstein switched from 4 to 3 lately and seti is doing fine with 4.
What I suggest: use replication 3 in the beginning of a study, when there's still lot's of work to do and it doesn't matter if some WUs will take longer than 1 or 2 weeks. If the studies are almost finished, say 50.000 jobs left, you can go to replication 4 or 5 to finish the jobs quickly so you can start the final analysis.

I know LHC is doing fine with the CPU power they get, but all the time saved can possibly benefit other projects.

Regards, MrS
Scanning for our furry friends since Jan 2002
ID: 12021 · Report as offensive     Reply Quote
Profile Rebirther
Avatar

Send message
Joined: 18 Sep 04
Posts: 18
Credit: 26,904
RAC: 0
Message 12022 - Posted: 15 Jan 2006, 13:05:19 UTC

I agree with you, we can more doing with a quorum of 3.
ID: 12022 · Report as offensive     Reply Quote
Desti

Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,875,851
RAC: 0
Message 12029 - Posted: 15 Jan 2006, 13:44:47 UTC

Right, 3 should be enough.
Is it possible to send the fourth WU only if there are no 3 valid results within 3 or 4 days back.
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]
ID: 12029 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 12034 - Posted: 15 Jan 2006, 14:06:24 UTC

If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ...
ID: 12034 · Report as offensive     Reply Quote
Desti

Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,875,851
RAC: 0
Message 12037 - Posted: 15 Jan 2006, 14:13:07 UTC - in response to Message 12034.  

If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ...


Take a look at the send time, 5 times within 2 minutes.

http://lhcathome.cern.ch/workunit.php?wuid=1034762
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]
ID: 12037 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 12040 - Posted: 15 Jan 2006, 14:56:38 UTC

AFAIK issuing 5 results here at LHC had two reasons,
although I can't point you to an official post, sorry.

1) It increases throughput. The quorum is reached sooner. And most of
the time at least the 4th result will not be sent. It is marked "did not need".
LHC needs throughput, because at the end of a large study some
smaller will follow. They depend on the results from the earlier studies.
So the sooner all results are returned and quorums formed, the sooner new WUs can be released.

2) During LHC beta in Sept. 2004 and later there were large numerical
differences between platforms, so three results were almost always not
enough. So sending more was the way to go. (This is no longer valid though.)

Michael
Team Linux Users Everywhere
ID: 12040 · Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 26 Jul 05
Posts: 10
Credit: 79,466
RAC: 0
Message 12043 - Posted: 15 Jan 2006, 15:18:19 UTC

@Michael:

That certainly makes sense. Why I'm asking is that in the beginning of a >500.000 WU job, the quick response is not needed and a lower quorum would increase the throughput (considerably). At the end of that job and for smaller ones the replication 5 is fine to get the latency of the results down.
I'd guess that this is just one parameter for the server which can be adjusted easily, maybe even automatically when they generate the jobs.

@Paul:

The way I understand it: initial replication means that they aim to issue 5 results. If there are not enough hosts demanding work, this may not happen, but normally it looks like this random one of mine:
http://lhcathome.cern.ch/workunit.php?wuid=988541

Like Desti already said, all 5 are sent out within 2 minutes. The quorum of 3 means that if they get 3 identical results and 2 differing ones, they will take the 3-result and won't send the WU out to other hosts.

MrS
Scanning for our furry friends since Jan 2002
ID: 12043 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 12045 - Posted: 15 Jan 2006, 15:28:51 UTC - in response to Message 12043.  
Last modified: 15 Jan 2006, 15:32:29 UTC

@Michael:

I'd guess that this is just one parameter for the server which can be adjusted easily, maybe even automatically when they generate the jobs.


This might be a WU attribute. Someone with BOINC server side knowledge
might know for sure, I don't. As to automatically adjusted parameters;
to the end of a study deadlines are shortened significantly, less
than a week, to increase throughput. Forgot to mention it in my earlier
post, sorry.

Michael

Team Linux Users Everywhere
ID: 12045 · Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 26 Jul 05
Posts: 10
Credit: 79,466
RAC: 0
Message 12046 - Posted: 15 Jan 2006, 15:43:56 UTC
Last modified: 15 Jan 2006, 15:44:47 UTC

If they are already adjusting parameters depending on the status of a study, then this (normal replication 3 or 4, later 5) should not be too hard to implement.

MrS

Edit & OT: finally added my avatar ... oh I love this picture :D
Scanning for our furry friends since Jan 2002
ID: 12046 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 12051 - Posted: 15 Jan 2006, 17:06:31 UTC - in response to Message 12046.  

Edit & OT: finally added my avatar ... oh I love this picture :D


yep - first time in 2006 a new avatar has got me smiling :)
ID: 12051 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 12052 - Posted: 15 Jan 2006, 17:12:02 UTC - in response to Message 12034.  

If you look, they have min quorum of 3, and, that means that they may not issue the last one or two results ...


Here is an example.

Michael
Team Linux Users Everywhere
ID: 12052 · Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 26 Jul 05
Posts: 10
Credit: 79,466
RAC: 0
Message 12053 - Posted: 15 Jan 2006, 17:34:57 UTC

That's the way I like to see it. Now the question is how often that happens compared to our examples where all 5 were sent out immediately.

MrS
Scanning for our furry friends since Jan 2002
ID: 12053 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,245
RAC: 15,991
Message 12079 - Posted: 16 Jan 2006, 9:29:28 UTC
Last modified: 16 Jan 2006, 9:49:46 UTC

How about this one?

http://lhcathome.cern.ch/workunit.php?wuid=1016850

it has been sent only once.
ID: 12079 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 12087 - Posted: 16 Jan 2006, 13:51:48 UTC - in response to Message 12079.  

How about this one?

http://lhcathome.cern.ch/workunit.php?wuid=1016850

it has been sent only once.


Don't worry, they are sent eventually.

Michael

PS: As of now one result pending, four unsent.
Team Linux Users Everywhere
ID: 12087 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 12088 - Posted: 16 Jan 2006, 15:23:32 UTC

LHC@Home issues work in a slightly different manner than the other projects. This leads to some of the "odd" effects seen. The intent is to send out the first of the unsent set of all work, then the second in "stripes" across the set of work to be issued.

Having the 4th and 5th result pre-created means that it can be issued very quickly without waiting for the expiration of the deadlines before a new work unit is created.
ID: 12088 · Report as offensive     Reply Quote
Desti

Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,875,851
RAC: 0
Message 12218 - Posted: 21 Jan 2006, 14:34:07 UTC - in response to Message 12079.  

How about this one?

http://lhcathome.cern.ch/workunit.php?wuid=1016850

it has been sent only once.



Now it's send out FIVE times, like a lot of others too.
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]
ID: 12218 · Report as offensive     Reply Quote
arturg

Send message
Joined: 29 Sep 05
Posts: 10
Credit: 11,373
RAC: 0
Message 12579 - Posted: 30 Jan 2006, 11:23:13 UTC - in response to Message 12040.  

AFAIK issuing 5 results here at LHC had two reasons,
although I can't point you to an official post, sorry.

1) It increases throughput....
Michael


Imagine, you manage dogsledge post in Arctica and your aim is making big throuput, so for each dog team you choose 2 quickly dogs and for speeding them up you add them 3 slowly ones ;) as probably LHC does now.

Artur
ID: 12579 · Report as offensive     Reply Quote

Message boards : Number crunching : initial replication


©2024 CERN