Message boards :
Number crunching :
Initial Replication
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0 |
IR can be set to 5, like it is now, then when the server software version upgrade is done, LHC can implement the server-side "redundant result" cancellation. SETI did this and it worked out just fine. It's not so much needed now as they've gone down to IR=2, MQ=2... Dagorath mentioned this some time ago in this thread, but I think the method of delivery of the idea was...less than ideal due to being intertwined with some bickering amongst several different people... How it works is that if a quorum has been reached, when a client connects to the scheduler, any results that the host has that have already made quorum and validated can be cancelled from the server. This can be done one of three different ways: 1. If the client (host) has not started the result at all, delete the result from the host. 2. If the client has started the result, let it go to completion. 3. Delete the result regardless of whether or not it has been started. I want to say the support for that was included in BOINC 5.8.17, but I'm not sure... With the relatively short running results here, I wouldn't be opposed to option 3, although options 1 & 2 are the most "user-friendly"... This allows the project to keep IR=5, but satisfies the concerns of people who are mentioning the waste of electricity... FWIW, YMMV, etc, etc, etc... Brian |
Send message Joined: 18 Sep 04 Posts: 47 Credit: 1,886,234 RAC: 0 |
Other than the fact that some WU's may get crunched 2 more times than needed (with credit granted), I'm not sure where this is causing harm. Sure you're using electricity, but it's up to the project. People have been complaining about "lack of work" here for years, and to cut IR from 5 to 3 means that there's 40% less work right off the bat. Right now, today, LHC, has taken some measures to keep work in the pipeline longer - the 2/day/cpu, the 1h delay, etc. with the press release and all. I think we should all just step back and be happy that there has been a flow of work (be it 2/day) for the longest time I've seen in years. If you don't like the way the project is being managed, speak with your feet and crunch for another project. |
Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0 |
What does indeed get slowed down is archiving the completed workunits. The workunit as a whole must remain so long as there is one resultID that hasn't been turned back in and that has not passed the deadline for the result. From what I've been able to read (and experience), this project is much more sensitive to floating point math differences than others. I had a couple of results that were declared invalid just over this past week. In both cases I was either first or second to report a completed result. If the replication had been at 3 and quorum at 3, then there would've been at least one more replication made. That replication would have the same amount of time to be returned as the initial replication, but it causes the workunit as a whole to be waiting longer to be stored in the Master Science Database than perhaps a replication of 5, all with the same deadline, would have. To make the determination you're making that 5 is "wasteful", you really need to know the exact error rates on the first replication. I don't think someone outside of the project team can know that for a fact... I think the best thing to do is to implement the server-side aborts, like what SETI did, but leave the replication at 5. Brian |
Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0 |
You could argue that replicating anything more than the quorum is causing a "delay" in work being done, but you have to keep in mind that the insertion into the science database is the ultimate goal, and it may or may not be delayed by more replication. The fact that LHC units are so short at this point in time and that nobody is holding a large cache makes it difficult to give you a good example of what can happen if you get a reissue. To give you a better idea of what can happen, take a look at this example from Einstein numerous reissues. As you read down that list, the results were issued in order from top to bottom. The first two were generated. The 2nd host bombed out of it, so it got reissued. That was the same day, so not a big negative impact. So, the 3rd host reports, but the 1st host runs out the deadline. This causes another result to get issued, but it would've had 3 weeks to make it back in. This new host fails out the next day. Due to the way Einstein's data packs are handled, the next available host doesn't come along for a week. They too burn up the entire 3 weeks, and so another result has been issued. Had the intitial replication been higher, perhaps set at 3, another host would've picked up the result and run it successfully, thus making the longest time that result might've been in the work queue approximately 3 weeks. Instead, it is now 8+ weeks. Sure, there's no "guarantee" that the extra replication would've helped a bit, but it can help, depending on the circumstances. Since the LHC units are so short running and since people are not able to maintain large queues right now, the consequence of this has been minimized. Additionally, Einstein isn't a time-sensitive project. They can wait a few extra weeks for the results if need be. This is why they can do the lower replication. As for task A and task B waiting, the bigger cause of any "wait" right now is the forced low quota... To make the determination you're making that 5 is "wasteful", you really need to know the exact error rates on the first replication. I don't think someone outside of the project team can know that for a fact...
That may or may not be a safe assumption. I'd seek clarification (politely) from Alex or Neasan. I think the best thing to do is to implement the server-side aborts, like what SETI did, but leave the replication at 5. If Neasan or Alex would agree to that then I would shut up. [/quote] Bear in mind that it may need a server upgrade, and folks like me, that use BOINC 5.8.16, would not process the server-side requests due to the support for it was added in BOINC 5.8.17 (I believe). I know 5.8.16 doesn't support it... Brian |
Send message Joined: 29 Sep 04 Posts: 42 Credit: 11,505,632 RAC: 0 |
Dagorath could you please shut up, you spam in every thread. If you are not happy with the IR so please disconnect and go away. Having some redundant results have some advantages, you can read the statements in this thread. There are projects with an IR of 1, like QMC, but thats a Monte Carlo methode. Your reason to save energy is feigned. If you want to crunch more efficient, buy yourself a new more efficient CPU. By the way, i like my CPU´s to crunch LHC cause of energy saving, because doing QMC (which i mostly do) needs much more energy. (measure it yourself there are differences in project work) |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
If Dagorath(or anyone for that matter) is unhappy with our IR or any other way that we have decided to do the project you are free to leave and crunch for another project. We take all criticism and opinions on board and listen to them all and weigh up their merits and discuss things with the scientists. However if you continue to stay and "lobby" us to change IR(or another aspect of the project) by spamming in threads or just plain making a nuisance of yourself I am more than happy to detach you, ban you and wipe your credit from the stats. When we upgrade the service we will re-discuss the IR with the scientists and it may change but it may not. |
Send message Joined: 31 Dec 05 Posts: 68 Credit: 8,691 RAC: 0 |
I agree with you about spamming multiple threads, but presumably it's okay for Dagorath (or anyone else) to lobby about IR in *this* thread? If the IR is not changed after the server upgrade, it would nice if you could at least let us know the scientists' reasons for choosing IR=5. |
Send message Joined: 17 Sep 04 Posts: 41 Credit: 27,497 RAC: 0 |
If Dagorath(or anyone for that matter) is unhappy with our IR or any other way that we have decided to do the project you are free to leave and crunch for another project. I have taken your good advice Neasan and detached my 2 computers. I hope you will not take the Predictor route and delete my account and credits that I have already done or resort to censorship of free speech. |
Send message Joined: 17 Sep 04 Posts: 41 Credit: 27,497 RAC: 0 |
Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit, is primarily to save CPU cycles and secondarily to save electricity, if you would care to read the thread (though I doubt reading is one of your basic skills. I totally agree with you but it appears we're in a club of 2. You have put some very good arguements in this thread, sadly Neasan can only respond with < if you don't like it, leave > Obviously we are not needed here. |
Send message Joined: 1 Mar 07 Posts: 47 Credit: 32,356 RAC: 0 |
Just filtered my first troll on LHC. If you want to do the same just click on http://lhcathome.cern.ch/lhcathome/edit_forum_preferences_form.php and add the appropriate user id. |
Send message Joined: 4 Oct 06 Posts: 38 Credit: 24,908 RAC: 0 |
You have put some very good arguements in this thread, sadly Neasan can only respond with < if you don't like it, leave > Obviously, you missed a very important statement from Neasan: When we upgrade the service we will re-discuss the IR with the scientists and it may change but it may not. Meaning they aren't worrying about right now, but they are aware that *some* users are concerned with the efficiency. Open your eyes and read the *entire* post, not just what you want to hear to continue to feel that the world is against you. (p.s. Nice try to bait with the Predictor thing, but nobody's biting.) |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit, Rules: No messages whose only intention is to annoy or antagonize other people. No messages that are deliberately hostile or insulting. Dagorath and Fat Loss 4 Idiots (and anyone else) you are volunteers here and as such are free to take your computers elsewhere. If you wish to detach that is fine but by staying attached you're implicitly agreeing to do things our way. You have both made your points and we have not just ignored them but we have set IR to 5 and are leaving it as such. If you do detach I will not delete your credit, if you read the post you will see that I was pointing out that if you stayed attached and willing to do the work but continued to bitch, moan and whine I would consider taking the drastic step of banning you and deleting credit. Also the predictor@home thing was a bit much don't you think? I've let you have your say and only as a last resort have I threatened to do anything drastic. |
Send message Joined: 29 Sep 04 Posts: 42 Credit: 11,505,632 RAC: 0 |
Anybody who is unhappy with my posts is welcome to either ignore me or kiss my ass. The issue, C0M you insufferable twit, is primarily to save CPU cycles and secondarily to save electricity, if you would care to read the thread (though I doubt reading is one of your basic skills. Wow, sounds like schoolyard, i didn't expect to hear something like that here, its even worser than one of our Collaboration Meetings.(Yes, i am a particle physicist. And i doubt you really know what this project wants to accomplish.) By the way, i had to look up some of these nice/nasty words, cause thats not the language i am used to. |
Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0 |
you simply don't like my posts and you don't like them because it means less WUs. I cut out all the noise and boiled it down to this. The "you" mentioned above is to be taken in general, not specific, as your words were aimed at those who disagree with you... I think you can tell that I don't totally disagree with you. At least I hope you can tell that... Having said that, in my opinion, you are spamming multiple threads and what you're doing does border on hijacking the thread. I've laid out some reasons why the additional replication can help speed the process up. Neasan has said that they will revisit the issue with the project scientists. Alex and Neasan are only the administrators of the servers. They have to abide by what the project scientists want. That said, Neasan, I think the idea of doing the server-side aborts is worthwhile. It still allows the IR to be set to 5, but if a workunit has met quorum and has been validated, it allows you to attempt to cancel the remaining replications and possibly get that result into the science database a little bit faster. I say "attempt to" because if the version of BOINC that the host is using doesn't support the aborts, it won't work. I said "possibly" because if the host doesn't support the abort and/or one of the hosts doesn't contact the scheduler before the deadline, then it will still take the full duration of the longest deadline to be able to send the workunit through the assimilation process... Brian |
Send message Joined: 14 Oct 07 Posts: 1 Credit: 158 RAC: 0 |
Well, I'm a newcomer to this project, which I had wanted to join for quite some time but didn't, seeing as there was no work. With all the publicity and discovering that the bucket was full, I signed up. So second day I took a look at my account to see how my first WUs had fared, and discovered that the IR is a whooping 5! What a waste. I already frown at projets that have an IR of 3, so imagine 5. I immediately halved my participation in the project; I will probably soon set it to no new task pending a better policy. If you need the results fast, just shorten the deadline, that's what it's for, and set IR at a decent level. I understand the project is sensitive to calculation errors, so leave quorum and IR at 3 if 2 really is insufficient, but please don't go farther. BTW if you do need the results fast, I really don't get the quota that dole out the WUs. It seems like it's against LHC's best interests in the short run, and unless there is a large increase in workunits looming I don't see the point in getting so many new crunchers whereas the ones you had before were more than sufficient to crunch everything the project was sending their way. Information Wants To Be Free |
©2024 CERN