Message boards :
Number crunching :
Past Due Date
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Dec 05 Posts: 2 Credit: 90,559 RAC: 0 |
My question if WU are completed past the short due date, will they still count? I am not worried about credits, but if I continue with the WU when the messages said consider aborting them, will they still help the project? Since this project only gives WU once in a while, I ensure that I grab plenty. But this group of WU seem to take longer than usual and will have about 4 go past the morning (due date/time). I don't mind continuing with them if these WU will help the project. Probably will be a day or two late. Should I abort or continue? siifred |
Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0 |
... Probably will be a day or two late. Should I abort or continue? Your best plan is to check each work unit on the website to see if a quorum has already been formed. If it has, there is absolutely no benefit to the project if you continue to process and return your result. If you managed to squeeze it in before the deadline then you would get credit but the result would not be used by the project anyway. If it's after the deadline - no credit as well as no use to the project. The other thing is to look at crunch times of other people to see if those results are going to run the full expected time or if they might finish early. You never know, you might have a batch of all short crunch times left :). EDIT: I just had a quick look at your results on the website. Your P4 machine has 8 outstanding results, all of which have already had the quorum formed and the results validated. In each case, the awarded credits are close to 30 so there are no short run times in that lot. The project already has the information it needs from those workunits so, if you are not interested in credit, you should abort them all forthwith. However, don't take my word for it - go check for yourself and learn how the system works. Cheers, Gary. |
Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 |
... Probably will be a day or two late. Should I abort or continue? There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit. |
Send message Joined: 24 Dec 06 Posts: 5 Credit: 35,268 RAC: 0 |
wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ? |
Send message Joined: 18 Sep 04 Posts: 38 Credit: 173,867 RAC: 0 |
only if You grab to many wu :p |
Send message Joined: 24 Dec 06 Posts: 5 Credit: 35,268 RAC: 0 |
lets assume that these 5 comps say intel celery 700mhz, athlon 1500+, intel core 2 duo, intel xeon, amd 64x2 5200+ all are running same version bonic client, os and are all only crunching in one project and all have Connect to network about every 0.1 days ect ect ie all comps are the same except for cpu processing power (yes i know these many more differences ie cache size,ram speed ide vs sta ect ect but work with me here please) so in the above example the intel celery 700mhz and athlon 1500+ would never be able to return a work unit before a quorom of 3 has been reached ?? is this not correct or is my logic broken ?? |
Send message Joined: 7 Oct 06 Posts: 114 Credit: 23,192 RAC: 0 |
:-) I would also like 2 know, as i am getting credit on my P2 400 MHz??? :-) i have a P1 which i was thinking of bringing on line :-) Regards Masud.
|
Send message Joined: 1 Sep 04 Posts: 36 Credit: 78,199 RAC: 0 |
No, <result_abort_if_unstarted> will only abort results that haven't started yet, already started results will continue to the end, if user doesn't manually abort them. This is planned used on any wu that has already got a Canonical Result, meaning result won't be used for science, but already started results can still be credited if returned before deadline. For cancelled wu, errored-out wu and so on, <result_abort> will be used, and this will immediately abort result regardless of started or not. Since result won't be used for science nor crediting, it's a waste of time to continue to run. Needed client-side-support is included in v5.8.x, but server-side isn't implemented yet... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Send message Joined: 30 Dec 06 Posts: 2 Credit: 84 RAC: 0 |
Does this mean then that 2 people could be wasting their computer time for each work unit? |
Send message Joined: 24 Dec 06 Posts: 5 Credit: 35,268 RAC: 0 |
Does this mean then that 2 people could be wasting their computer time for each work unit? that is what i am thinking |
Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 |
Does this mean then that 2 people could be wasting their computer time for each work unit? Well, that's what happens on all projects. But it's not "wasting", since you don't know which of the results will come first (and thus which are being wasted). Also, if only one is sent, they won't know if it's correct. If two are sent, and aren't equal results, one of them is bad (but which?). If three are sent, two match and one doesn't, there is the answer. |
Send message Joined: 2 Sep 04 Posts: 378 Credit: 10,765 RAC: 0 |
Does this mean then that 2 people could be wasting their computer time for each work unit? On an individual basis, crunching a work unit which has quorum is a waste of computing power. On a group basis though, having redundancy results in the GROUP crunching less if there is some redundancy and the majority of work units being able to get a quorum result, which results in fewer work units that need to be resent out all over again. The amount of redundancy that you need depends on the odds of a computer returning a valid result. In the old days, it seemed there was a low chance of success for a host, which meant that sending out 5 computing units for each work unit to get a quorum of 3 resulted in much less work than attempting to compute with groups of 4 crunchers. I'm not the LHC Alex. Just a number cruncher like everyone else here. |
Send message Joined: 27 Jul 05 Posts: 6 Credit: 14,693,664 RAC: 15 |
Does this mean then that 2 people could be wasting their computer time for each work unit? I don't think that it will be for all the units. Because the 5 hosts who get the same WU don't get the same one of the other WUs. Every host gets a different set of WUs that will be crunched in a different order. So for some of them at least the quorum isn't yet reached when they are finished. |
Send message Joined: 13 Jul 05 Posts: 143 Credit: 263,300 RAC: 0 |
Here's an interesting thought about the Work Units and the way they are granted credit -- first, let me say that I'm no behavioral science expert, but this might just be LHC's passive/aggressive answer to those who attempt to hog the WUs. A given Work Unit is processed (sent out) to five sites for processing. Those sites that have few units in queue process quickly and respond quickly. As the sites with much larger quantities of work (more WUs) get around to processing and responding, time is passing, and the possibility of returning a Work Unit that has already established a quorum becomes greater and greater. Ultimately, those who return work just before the deadline appear to face the greatest chance of getting only partial credit, or even no credit for their work. Am I missing something here ??? If I've lived this long, I've gotta be that old |
Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0 |
.... Am I missing something here ??? Yes, unfortunately I believe you are :). The granting of full, partial or zero credit has nothing to do with returning results late in the cycle. Even if you are within the deadline by just 1 second, valid results will still get full credit and that's the way it should be. If your results do not validate, you get zero credit. If your results are close but not within the quite strict tolerances required by LHC, you are likely to get half credit. I think this is a very fair system because "close" results may be caused by hardware or software differences in the computer/OS/system libraries being used and are therefore outside the direct control of the user. In these cases the users should get some reward for their efforts. On the more general question of the return of the 4th and 5th results when the quorum of three has already been formed, this should not be treated any differently and there should not be any stigma or penalty for being last to return. In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of. A useful option would be for the core client to ask the server if a quorum exists before starting a new result. That way, an unneeded result could be aborted before computation started and a fresh replacement downloaded (if available). However, I imagine this would put a lot of extra strain on the servers and therefore might not be feasible. It would be very useful however if a machine has had to be switched off for a day or two and the cached work units were therefore a bit stale. Aborting stale unneeded results and moving on to something more useful would seem to be a good idea. Cheers, Gary. |
Send message Joined: 27 Jul 05 Posts: 6 Credit: 14,693,664 RAC: 15 |
In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of. In my opinion it depends of the project. Where previous results are needed for the issuing of new WUs (LHC, Chess960...), speed is of utmost importance and the initial replication number should be higher. For other projects where there is just an enormous bunch of WUs to process which takes many months (Einstein, Sztaki...), increasing the replication number is pure waste and actually delays the overall time. |
Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 |
In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of. Yep, that's the difference between low-latency and high-throughput :) Low latency means as little time as possible between sending workunits and having them done, high throughput is having as many finished workunits per minute as possible. Not all projects need the same. |
Send message Joined: 4 Oct 06 Posts: 38 Credit: 24,908 RAC: 0 |
A useful option would be for the core client to ask the server if a quorum exists before starting a new result. That way, an unneeded result could be aborted before computation started and a fresh replacement downloaded (if available). However, I imagine this would put a lot of extra strain on the servers and therefore might not be feasible. It would be very useful however if a machine has had to be switched off for a day or two and the cached work units were therefore a bit stale. Aborting stale unneeded results and moving on to something more useful would seem to be a good idea. (I remember reading something about this, my apologies for not remembering where) I believe a feature such as this is in the works, whereas the client side support is included in 5.8.x but the server side needs worked on(?). So it isn't implemented yet. Only those WUs on a host that are in "ready to run" state will be eligible for the quorum-met-so-abort-remaining-WUs action. If a WU even has just 1 second of CPU time it will NOT be aborted by BOINC since that would involuntarily waste a users CPU time. The only thing wasted is bandwidth of downloading the unneeded WUs and some CPU time for communicating. In my opinion, if the project needs results quickly, they should implement a small "in progress" cache per CPU (so people that download a lot of WUs don't hold on to them while others who are finished with theirs have empty caches) and use the abort-if-quorum-met feature (when available). The servers should be pushed to what they can handle and that is where the bottle-neck will occur instead of on the client side. Don't get distracted by shiny objects. |
Send message Joined: 1 Sep 04 Posts: 36 Credit: 78,199 RAC: 0 |
(I remember reading something about this, my apologies for not remembering where) ... like earlier in the thread... :) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Send message Joined: 4 Oct 06 Posts: 38 Credit: 24,908 RAC: 0 |
|
©2024 CERN