Message boards : Number crunching : Initial Replication
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 18223 - Posted: 17 Oct 2007, 0:18:29 UTC

IR can be set to 5, like it is now, then when the server software version upgrade is done, LHC can implement the server-side "redundant result" cancellation. SETI did this and it worked out just fine. It's not so much needed now as they've gone down to IR=2, MQ=2...

Dagorath mentioned this some time ago in this thread, but I think the method of delivery of the idea was...less than ideal due to being intertwined with some bickering amongst several different people...

How it works is that if a quorum has been reached, when a client connects to the scheduler, any results that the host has that have already made quorum and validated can be cancelled from the server. This can be done one of three different ways:

1. If the client (host) has not started the result at all, delete the result from the host.
2. If the client has started the result, let it go to completion.
3. Delete the result regardless of whether or not it has been started.

I want to say the support for that was included in BOINC 5.8.17, but I'm not sure... With the relatively short running results here, I wouldn't be opposed to option 3, although options 1 & 2 are the most "user-friendly"...

This allows the project to keep IR=5, but satisfies the concerns of people who are mentioning the waste of electricity...

FWIW, YMMV, etc, etc, etc...

Brian
ID: 18223 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 18 Sep 04
Posts: 47
Credit: 1,886,234
RAC: 0
Message 18225 - Posted: 17 Oct 2007, 3:58:21 UTC
Last modified: 17 Oct 2007, 4:02:43 UTC

Other than the fact that some WU's may get crunched 2 more times than needed (with credit granted), I'm not sure where this is causing harm. Sure you're using electricity, but it's up to the project.

People have been complaining about "lack of work" here for years, and to cut IR from 5 to 3 means that there's 40% less work right off the bat.

Right now, today, LHC, has taken some measures to keep work in the pipeline longer - the 2/day/cpu, the 1h delay, etc. with the press release and all.

I think we should all just step back and be happy that there has been a flow of work (be it 2/day) for the longest time I've seen in years.

If you don't like the way the project is being managed, speak with your feet and crunch for another project.
ID: 18225 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 18242 - Posted: 17 Oct 2007, 14:38:23 UTC - in response to Message 18235.  


Which pretty much proves they don't need the results back ultra-fast (the argument some people were using to justify IR=5 in spite of the fact IR=5 only slows things down).

<snip>

Well, if your objective is do unnecessary work then you can be happy. The fact is the project could be getting the same results (a quorum of 3) with 40% less work. Nobody should be happy about that.



What does indeed get slowed down is archiving the completed workunits. The workunit as a whole must remain so long as there is one resultID that hasn't been turned back in and that has not passed the deadline for the result.

From what I've been able to read (and experience), this project is much more sensitive to floating point math differences than others. I had a couple of results that were declared invalid just over this past week. In both cases I was either first or second to report a completed result. If the replication had been at 3 and quorum at 3, then there would've been at least one more replication made. That replication would have the same amount of time to be returned as the initial replication, but it causes the workunit as a whole to be waiting longer to be stored in the Master Science Database than perhaps a replication of 5, all with the same deadline, would have.

To make the determination you're making that 5 is "wasteful", you really need to know the exact error rates on the first replication. I don't think someone outside of the project team can know that for a fact...

I think the best thing to do is to implement the server-side aborts, like what SETI did, but leave the replication at 5.

Brian
ID: 18242 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 18262 - Posted: 18 Oct 2007, 4:55:21 UTC - in response to Message 18259.  


Good point but having 5 crunchers working on WU A when quorum = 3 means WU B gets delayed (because 2 of those 5 crunchers could be working on WU B rather than A).


You could argue that replicating anything more than the quorum is causing a "delay" in work being done, but you have to keep in mind that the insertion into the science database is the ultimate goal, and it may or may not be delayed by more replication. The fact that LHC units are so short at this point in time and that nobody is holding a large cache makes it difficult to give you a good example of what can happen if you get a reissue.

To give you a better idea of what can happen, take a look at this example from Einstein numerous reissues.

As you read down that list, the results were issued in order from top to bottom. The first two were generated. The 2nd host bombed out of it, so it got reissued. That was the same day, so not a big negative impact. So, the 3rd host reports, but the 1st host runs out the deadline. This causes another result to get issued, but it would've had 3 weeks to make it back in. This new host fails out the next day. Due to the way Einstein's data packs are handled, the next available host doesn't come along for a week. They too burn up the entire 3 weeks, and so another result has been issued.

Had the intitial replication been higher, perhaps set at 3, another host would've picked up the result and run it successfully, thus making the longest time that result might've been in the work queue approximately 3 weeks. Instead, it is now 8+ weeks. Sure, there's no "guarantee" that the extra replication would've helped a bit, but it can help, depending on the circumstances.

Since the LHC units are so short running and since people are not able to maintain large queues right now, the consequence of this has been minimized. Additionally, Einstein isn't a time-sensitive project. They can wait a few extra weeks for the results if need be. This is why they can do the lower replication.

As for task A and task B waiting, the bigger cause of any "wait" right now is the forced low quota...

To make the determination you're making that 5 is "wasteful", you really need to know the exact error rates on the first replication. I don't think someone outside of the project team can know that for a fact...



Earlier in this thread someone mentioned the error rate is 25%. Since that's never been disputed, I've assumed it's true.


That may or may not be a safe assumption. I'd seek clarification (politely) from Alex or Neasan.


I think the best thing to do is to implement the server-side aborts, like what SETI did, but leave the replication at 5.


If Neasan or Alex would agree to that then I would shut up.

[/quote]

Bear in mind that it may need a server upgrade, and folks like me, that use BOINC 5.8.16, would not process the server-side requests due to the support for it was added in BOINC 5.8.17 (I believe). I know 5.8.16 doesn't support it...

Brian
ID: 18262 · Report as offensive     Reply Quote
Profile CoM

Send message
Joined: 29 Sep 04
Posts: 42
Credit: 11,505,632
RAC: 0
Message 18265 - Posted: 18 Oct 2007, 6:52:38 UTC

Dagorath could you please shut up, you spam in every thread. If you are not happy with the IR so please disconnect and go away.
Having some redundant results have some advantages, you can read the statements in this thread.
There are projects with an IR of 1, like QMC, but thats a Monte Carlo methode.
Your reason to save energy is feigned. If you want to crunch more efficient, buy yourself a new more efficient CPU.
By the way, i like my CPU´s to crunch LHC cause of energy saving, because doing QMC (which i mostly do) needs much more energy. (measure it yourself there are differences in project work)
ID: 18265 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 18269 - Posted: 18 Oct 2007, 9:55:31 UTC

If Dagorath(or anyone for that matter) is unhappy with our IR or any other way that we have decided to do the project you are free to leave and crunch for another project.

We take all criticism and opinions on board and listen to them all and weigh up their merits and discuss things with the scientists. However if you continue to stay and "lobby" us to change IR(or another aspect of the project) by spamming in threads or just plain making a nuisance of yourself I am more than happy to detach you, ban you and wipe your credit from the stats.

When we upgrade the service we will re-discuss the IR with the scientists and it may change but it may not.
ID: 18269 · Report as offensive     Reply Quote
J Langley

Send message
Joined: 31 Dec 05
Posts: 68
Credit: 8,691
RAC: 0
Message 18273 - Posted: 18 Oct 2007, 11:48:55 UTC - in response to Message 18269.  

I agree with you about spamming multiple threads, but presumably it's okay for Dagorath (or anyone else) to lobby about IR in *this* thread?

If the IR is not changed after the server upgrade, it would nice if you could at least let us know the scientists' reasons for choosing IR=5.
ID: 18273 · Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 17 Sep 04
Posts: 41
Credit: 27,497
RAC: 0
Message 18275 - Posted: 18 Oct 2007, 12:23:05 UTC - in response to Message 18269.  

If Dagorath(or anyone for that matter) is unhappy with our IR or any other way that we have decided to do the project you are free to leave and crunch for another project.



I have taken your good advice Neasan and detached my 2 computers.

I hope you will not take the Predictor route and delete my account and credits that I have already done or resort to censorship of free speech.
ID: 18275 · Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 17 Sep 04
Posts: 41
Credit: 27,497
RAC: 0
Message 18278 - Posted: 18 Oct 2007, 12:38:59 UTC - in response to Message 18276.  

Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit, is primarily to save CPU cycles and secondarily to save electricity, if you would care to read the thread (though I doubt reading is one of your basic skills.

As for spam, the definition of spam is invariably linked to unsolicited messages. You numbskulls fail to realise ALL the messages here are unsolicited therefore your own drivel is spam too, by your definition. The reason you asses trot out the spam word is that you simply don't like my posts and you don't like them because it means less WUs.




I totally agree with you but it appears we're in a club of 2.

You have put some very good arguements in this thread, sadly Neasan can only respond with < if you don't like it, leave >

Obviously we are not needed here.
ID: 18278 · Report as offensive     Reply Quote
Keith T.
Avatar

Send message
Joined: 1 Mar 07
Posts: 47
Credit: 32,356
RAC: 0
Message 18279 - Posted: 18 Oct 2007, 12:39:27 UTC

Just filtered my first troll on LHC.

If you want to do the same just click on http://lhcathome.cern.ch/lhcathome/edit_forum_preferences_form.php and add the appropriate user id.
ID: 18279 · Report as offensive     Reply Quote
larry1186

Send message
Joined: 4 Oct 06
Posts: 38
Credit: 24,908
RAC: 0
Message 18281 - Posted: 18 Oct 2007, 14:18:25 UTC - in response to Message 18278.  

You have put some very good arguements in this thread, sadly Neasan can only respond with < if you don't like it, leave >

Obviously, you missed a very important statement from Neasan:
When we upgrade the service we will re-discuss the IR with the scientists and it may change but it may not.

Meaning they aren't worrying about right now, but they are aware that *some* users are concerned with the efficiency. Open your eyes and read the *entire* post, not just what you want to hear to continue to feel that the world is against you.

(p.s. Nice try to bait with the Predictor thing, but nobody's biting.)
ID: 18281 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 18284 - Posted: 18 Oct 2007, 15:58:49 UTC - in response to Message 18276.  

Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit,


Rules:
No messages whose only intention is to annoy or antagonize other people.
No messages that are deliberately hostile or insulting.

Dagorath and Fat Loss 4 Idiots (and anyone else) you are volunteers here and as such are free to take your computers elsewhere. If you wish to detach that is fine but by staying attached you're implicitly agreeing to do things our way. You have both made your points and we have not just ignored them but we have set IR to 5 and are leaving it as such.

If you do detach I will not delete your credit, if you read the post you will see that I was pointing out that if you stayed attached and willing to do the work but continued to bitch, moan and whine I would consider taking the drastic step of banning you and deleting credit.

Also the predictor@home thing was a bit much don't you think? I've let you have your say and only as a last resort have I threatened to do anything drastic.
ID: 18284 · Report as offensive     Reply Quote
Profile CoM

Send message
Joined: 29 Sep 04
Posts: 42
Credit: 11,505,632
RAC: 0
Message 18291 - Posted: 18 Oct 2007, 19:03:25 UTC - in response to Message 18276.  

Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit, is primarily to save CPU cycles and secondarily to save electricity, if you would care to read the thread (though I doubt reading is one of your basic skills.

As for spam, the definition of spam is invariably linked to unsolicited messages. You numbskulls fail to realise ALL the messages here are unsolicited therefore your own drivel is spam too, by your definition. The reason you asses trot out the spam word is that you simply don't like my posts and you don't like them because it means less WUs.


Wow, sounds like schoolyard, i didn't expect to hear something like that here, its even worser than one of our Collaboration Meetings.(Yes, i am a particle physicist. And i doubt you really know what this project wants to accomplish.)
By the way, i had to look up some of these nice/nasty words, cause thats not the language i am used to.
ID: 18291 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 18292 - Posted: 18 Oct 2007, 19:30:45 UTC - in response to Message 18276.  

you simply don't like my posts and you don't like them because it means less WUs.


I cut out all the noise and boiled it down to this. The "you" mentioned above is to be taken in general, not specific, as your words were aimed at those who disagree with you...

I think you can tell that I don't totally disagree with you. At least I hope you can tell that... Having said that, in my opinion, you are spamming multiple threads and what you're doing does border on hijacking the thread.

I've laid out some reasons why the additional replication can help speed the process up. Neasan has said that they will revisit the issue with the project scientists. Alex and Neasan are only the administrators of the servers. They have to abide by what the project scientists want.

That said, Neasan, I think the idea of doing the server-side aborts is worthwhile. It still allows the IR to be set to 5, but if a workunit has met quorum and has been validated, it allows you to attempt to cancel the remaining replications and possibly get that result into the science database a little bit faster. I say "attempt to" because if the version of BOINC that the host is using doesn't support the aborts, it won't work. I said "possibly" because if the host doesn't support the abort and/or one of the hosts doesn't contact the scheduler before the deadline, then it will still take the full duration of the longest deadline to be able to send the workunit through the assimilation process...

Brian

ID: 18292 · Report as offensive     Reply Quote
hmusseau

Send message
Joined: 14 Oct 07
Posts: 1
Credit: 158
RAC: 0
Message 18294 - Posted: 18 Oct 2007, 21:59:50 UTC

Well, I'm a newcomer to this project, which I had wanted to join for quite some time but didn't, seeing as there was no work. With all the publicity and discovering that the bucket was full, I signed up.
So second day I took a look at my account to see how my first WUs had fared, and discovered that the IR is a whooping 5! What a waste. I already frown at projets that have an IR of 3, so imagine 5. I immediately halved my participation in the project; I will probably soon set it to no new task pending a better policy.
If you need the results fast, just shorten the deadline, that's what it's for, and set IR at a decent level. I understand the project is sensitive to calculation errors, so leave quorum and IR at 3 if 2 really is insufficient, but please don't go farther.
BTW if you do need the results fast, I really don't get the quota that dole out the WUs. It seems like it's against LHC's best interests in the short run, and unless there is a large increase in workunits looming I don't see the point in getting so many new crunchers whereas the ones you had before were more than sufficient to crunch everything the project was sending their way.

Information Wants To Be Free
ID: 18294 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Initial Replication


©2022 CERN