Message boards : Number crunching : Past Due Date
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile siifred
Avatar

Send message
Joined: 12 Dec 05
Posts: 2
Credit: 90,559
RAC: 0
Message 16018 - Posted: 4 Jan 2007, 3:23:37 UTC

My question if WU are completed past the short due date, will they still count? I am not worried about credits, but if I continue with the WU when the messages said consider aborting them, will they still help the project?
Since this project only gives WU once in a while, I ensure that I grab plenty. But this group of WU seem to take longer than usual and will have about 4 go past the morning (due date/time). I don't mind continuing with them if these WU will help the project. Probably will be a day or two late. Should I abort or continue?
siifred
ID: 16018 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 16019 - Posted: 4 Jan 2007, 4:42:06 UTC - in response to Message 16018.  
Last modified: 4 Jan 2007, 4:54:39 UTC

... Probably will be a day or two late. Should I abort or continue?


Your best plan is to check each work unit on the website to see if a quorum has already been formed. If it has, there is absolutely no benefit to the project if you continue to process and return your result. If you managed to squeeze it in before the deadline then you would get credit but the result would not be used by the project anyway. If it's after the deadline - no credit as well as no use to the project.

The other thing is to look at crunch times of other people to see if those results are going to run the full expected time or if they might finish early. You never know, you might have a batch of all short crunch times left :).

EDIT: I just had a quick look at your results on the website. Your P4 machine has 8 outstanding results, all of which have already had the quorum formed and the results validated. In each case, the awarded credits are close to 30 so there are no short run times in that lot. The project already has the information it needs from those workunits so, if you are not interested in credit, you should abort them all forthwith. However, don't take my word for it - go check for yourself and learn how the system works.


Cheers,
Gary.
ID: 16019 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 16020 - Posted: 4 Jan 2007, 5:05:31 UTC - in response to Message 16019.  

... Probably will be a day or two late. Should I abort or continue?


Your best plan is to check each work unit on the website to see if a quorum has already been formed. If it has, there is absolutely no benefit to the project if you continue to process and return your result. If you managed to squeeze it in before the deadline then you would get credit but the result would not be used by the project anyway. If it's after the deadline - no credit as well as no use to the project.

The other thing is to look at crunch times of other people to see if those results are going to run the full expected time or if they might finish early. You never know, you might have a batch of all short crunch times left :).

There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.
ID: 16020 · Report as offensive     Reply Quote
Danish Dynamite

Send message
Joined: 24 Dec 06
Posts: 5
Credit: 35,268
RAC: 0
Message 16024 - Posted: 4 Jan 2007, 6:36:06 UTC - in response to Message 16020.  


There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?
ID: 16024 · Report as offensive     Reply Quote
Profile Morgan the Gold
Avatar

Send message
Joined: 18 Sep 04
Posts: 38
Credit: 173,867
RAC: 0
Message 16026 - Posted: 4 Jan 2007, 7:57:13 UTC - in response to Message 16024.  


There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?


only if You grab to many wu :p
ID: 16026 · Report as offensive     Reply Quote
Danish Dynamite

Send message
Joined: 24 Dec 06
Posts: 5
Credit: 35,268
RAC: 0
Message 16028 - Posted: 4 Jan 2007, 10:28:09 UTC - in response to Message 16026.  


There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?


only if You grab to many wu :p


lets assume that these 5 comps say intel celery 700mhz, athlon 1500+, intel core 2 duo, intel xeon, amd 64x2 5200+ all are running same version bonic client, os and are all only crunching in one project and all have Connect to network about every 0.1 days ect ect ie all comps are the same except for cpu processing power (yes i know these many more differences ie cache size,ram speed ide vs sta ect ect but work with me here please)

so in the above example the intel celery 700mhz and athlon 1500+ would never be able to return a work unit before a quorom of 3 has been reached ?? is this not correct or is my logic broken ??
ID: 16028 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 7 Oct 06
Posts: 108
Credit: 22,919
RAC: 0
Message 16029 - Posted: 4 Jan 2007, 12:00:39 UTC - in response to Message 16028.  


:-) I would also like 2 know, as i am getting credit on my P2 400 MHz??? :-) i have a P1 which i was thinking of bringing on line :-)
Regards
Masud.


There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?


only if You grab to many wu :p


lets assume that these 5 comps say intel celery 700mhz, athlon 1500+, intel core 2 duo, intel xeon, amd 64x2 5200+ all are running same version bonic client, os and are all only crunching in one project and all have Connect to network about every 0.1 days ect ect ie all comps are the same except for cpu processing power (yes i know these many more differences ie cache size,ram speed ide vs sta ect ect but work with me here please)

so in the above example the intel celery 700mhz and athlon 1500+ would never be able to return a work unit before a quorom of 3 has been reached ?? is this not correct or is my logic broken ??


ID: 16029 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 1 Sep 04
Posts: 36
Credit: 78,199
RAC: 0
Message 16031 - Posted: 4 Jan 2007, 17:07:47 UTC - in response to Message 16024.  


There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?

No, <result_abort_if_unstarted> will only abort results that haven't started yet, already started results will continue to the end, if user doesn't manually abort them. This is planned used on any wu that has already got a Canonical Result, meaning result won't be used for science, but already started results can still be credited if returned before deadline.

For cancelled wu, errored-out wu and so on, <result_abort> will be used, and this will immediately abort result regardless of started or not. Since result won't be used for science nor crediting, it's a waste of time to continue to run.


Needed client-side-support is included in v5.8.x, but server-side isn't implemented yet...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 16031 · Report as offensive     Reply Quote
Evan Lynn

Send message
Joined: 30 Dec 06
Posts: 2
Credit: 84
RAC: 0
Message 16033 - Posted: 4 Jan 2007, 18:34:08 UTC

Does this mean then that 2 people could be wasting their computer time for each work unit?
ID: 16033 · Report as offensive     Reply Quote
Danish Dynamite

Send message
Joined: 24 Dec 06
Posts: 5
Credit: 35,268
RAC: 0
Message 16035 - Posted: 4 Jan 2007, 20:35:23 UTC - in response to Message 16033.  

Does this mean then that 2 people could be wasting their computer time for each work unit?

that is what i am thinking
ID: 16035 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 16036 - Posted: 4 Jan 2007, 20:47:28 UTC - in response to Message 16035.  

Does this mean then that 2 people could be wasting their computer time for each work unit?

that is what i am thinking

Well, that's what happens on all projects. But it's not "wasting", since you don't know which of the results will come first (and thus which are being wasted). Also, if only one is sent, they won't know if it's correct. If two are sent, and aren't equal results, one of them is bad (but which?). If three are sent, two match and one doesn't, there is the answer.
ID: 16036 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 16039 - Posted: 5 Jan 2007, 7:03:14 UTC - in response to Message 16033.  

Does this mean then that 2 people could be wasting their computer time for each work unit?


On an individual basis, crunching a work unit which has quorum is a waste of computing power.

On a group basis though, having redundancy results in the GROUP crunching less if there is some redundancy and the majority of work units being able to get a quorum result, which results in fewer work units that need to be resent out all over again.


The amount of redundancy that you need depends on the odds of a computer returning a valid result. In the old days, it seemed there was a low chance of success for a host, which meant that sending out 5 computing units for each work unit to get a quorum of 3 resulted in much less work than attempting to compute with groups of 4 crunchers.

I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 16039 · Report as offensive     Reply Quote
robert.mouris

Send message
Joined: 27 Jul 05
Posts: 6
Credit: 14,085,592
RAC: 770
Message 16088 - Posted: 9 Jan 2007, 21:50:36 UTC - in response to Message 16035.  

Does this mean then that 2 people could be wasting their computer time for each work unit?

that is what i am thinking

I don't think that it will be for all the units. Because the 5 hosts who get the same WU don't get the same one of the other WUs. Every host gets a different set of WUs that will be crunched in a different order. So for some of them at least the quorum isn't yet reached when they are finished.
ID: 16088 · Report as offensive     Reply Quote
Profile Ocean Archer
Avatar

Send message
Joined: 13 Jul 05
Posts: 143
Credit: 263,300
RAC: 0
Message 16089 - Posted: 9 Jan 2007, 23:53:10 UTC

Here's an interesting thought about the Work Units and the way they are granted credit -- first, let me say that I'm no behavioral science expert, but this might just be LHC's passive/aggressive answer to those who attempt to hog the WUs.

A given Work Unit is processed (sent out) to five sites for processing. Those sites that have few units in queue process quickly and respond quickly. As the sites with much larger quantities of work (more WUs) get around to processing and responding, time is passing, and the possibility of returning a Work Unit that has already established a quorum becomes greater and greater.
Ultimately, those who return work just before the deadline appear to face the greatest chance of getting only partial credit, or even no credit for their work. Am I missing something here ???


If I've lived this long, I've gotta be that old
ID: 16089 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 16091 - Posted: 10 Jan 2007, 3:41:59 UTC - in response to Message 16089.  

.... Am I missing something here ???


Yes, unfortunately I believe you are :).

The granting of full, partial or zero credit has nothing to do with returning results late in the cycle. Even if you are within the deadline by just 1 second, valid results will still get full credit and that's the way it should be.

If your results do not validate, you get zero credit. If your results are close but not within the quite strict tolerances required by LHC, you are likely to get half credit. I think this is a very fair system because "close" results may be caused by hardware or software differences in the computer/OS/system libraries being used and are therefore outside the direct control of the user. In these cases the users should get some reward for their efforts.

On the more general question of the return of the 4th and 5th results when the quorum of three has already been formed, this should not be treated any differently and there should not be any stigma or penalty for being last to return. In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of.

A useful option would be for the core client to ask the server if a quorum exists before starting a new result. That way, an unneeded result could be aborted before computation started and a fresh replacement downloaded (if available). However, I imagine this would put a lot of extra strain on the servers and therefore might not be feasible. It would be very useful however if a machine has had to be switched off for a day or two and the cached work units were therefore a bit stale. Aborting stale unneeded results and moving on to something more useful would seem to be a good idea.

Cheers,
Gary.
ID: 16091 · Report as offensive     Reply Quote
robert.mouris

Send message
Joined: 27 Jul 05
Posts: 6
Credit: 14,085,592
RAC: 770
Message 16092 - Posted: 10 Jan 2007, 6:11:16 UTC - in response to Message 16091.  
Last modified: 10 Jan 2007, 6:13:05 UTC

In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of.

In my opinion it depends of the project. Where previous results are needed for the issuing of new WUs (LHC, Chess960...), speed is of utmost importance and the initial replication number should be higher. For other projects where there is just an enormous bunch of WUs to process which takes many months (Einstein, Sztaki...), increasing the replication number is pure waste and actually delays the overall time.
ID: 16092 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 16096 - Posted: 10 Jan 2007, 15:05:05 UTC - in response to Message 16092.  

In quite a few cases, 4th and 5th results actually get used and do help shorten the overall time taken compared with what it would have been if only three had been issued initially. I'm sure that this is something that the project staff would be well aware of.

In my opinion it depends of the project. Where previous results are needed for the issuing of new WUs (LHC, Chess960...), speed is of utmost importance and the initial replication number should be higher. For other projects where there is just an enormous bunch of WUs to process which takes many months (Einstein, Sztaki...), increasing the replication number is pure waste and actually delays the overall time.

Yep, that's the difference between low-latency and high-throughput :) Low latency means as little time as possible between sending workunits and having them done, high throughput is having as many finished workunits per minute as possible. Not all projects need the same.
ID: 16096 · Report as offensive     Reply Quote
larry1186

Send message
Joined: 4 Oct 06
Posts: 38
Credit: 24,908
RAC: 0
Message 16097 - Posted: 10 Jan 2007, 15:17:01 UTC - in response to Message 16091.  

A useful option would be for the core client to ask the server if a quorum exists before starting a new result. That way, an unneeded result could be aborted before computation started and a fresh replacement downloaded (if available). However, I imagine this would put a lot of extra strain on the servers and therefore might not be feasible. It would be very useful however if a machine has had to be switched off for a day or two and the cached work units were therefore a bit stale. Aborting stale unneeded results and moving on to something more useful would seem to be a good idea.


(I remember reading something about this, my apologies for not remembering where) I believe a feature such as this is in the works, whereas the client side support is included in 5.8.x but the server side needs worked on(?). So it isn't implemented yet. Only those WUs on a host that are in "ready to run" state will be eligible for the quorum-met-so-abort-remaining-WUs action. If a WU even has just 1 second of CPU time it will NOT be aborted by BOINC since that would involuntarily waste a users CPU time. The only thing wasted is bandwidth of downloading the unneeded WUs and some CPU time for communicating.

In my opinion, if the project needs results quickly, they should implement a small "in progress" cache per CPU (so people that download a lot of WUs don't hold on to them while others who are finished with theirs have empty caches) and use the abort-if-quorum-met feature (when available). The servers should be pushed to what they can handle and that is where the bottle-neck will occur instead of on the client side.
Don't get distracted by shiny objects.
ID: 16097 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 1 Sep 04
Posts: 36
Credit: 78,199
RAC: 0
Message 16104 - Posted: 10 Jan 2007, 21:08:24 UTC - in response to Message 16097.  

(I remember reading something about this, my apologies for not remembering where)

... like earlier in the thread... :)

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 16104 · Report as offensive     Reply Quote
larry1186

Send message
Joined: 4 Oct 06
Posts: 38
Credit: 24,908
RAC: 0
Message 16105 - Posted: 10 Jan 2007, 23:43:42 UTC - in response to Message 16104.  

(I remember reading something about this, my apologies for not remembering where)

... like earlier in the thread... :)


Just making sure you were paying attention :) (to self: wow do I feel stupid) Thanks for jogging my memory!
Don't get distracted by shiny objects.
ID: 16105 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Past Due Date


©2022 CERN