Message boards : Number crunching : Initial replication and missing workunit
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile duanra

Send message
Joined: 12 Oct 07
Posts: 5
Credit: 3,113
RAC: 0
Message 19168 - Posted: 5 Mar 2008, 15:51:27 UTC

Hello !

2 questions for the same thread :

1) Is an initial replication of 5 really useful when you need a quorum of 3 ? Aren't the last 2 results a waste of CPU time ?

2) This workunit is supposed to be crunching now in my computer. http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=2310088

Only thing is I have only 1 workunit on my computer, and it's not that one.
Why is this so ?

Thanks

Duanra
ID: 19168 · Report as offensive     Reply Quote
J Langley

Send message
Joined: 31 Dec 05
Posts: 68
Credit: 8,691
RAC: 0
Message 19169 - Posted: 5 Mar 2008, 16:39:37 UTC - in response to Message 19168.  

Hello !

2 questions for the same thread :

1) Is an initial replication of 5 really useful when you need a quorum of 3 ?

Rather than set short deadlines, this project tries to ensure fast turnarounds by using IR > Q. This decision has caused some controversy amongst crunchers, but the admins have said they will look at it again when they upgrade the server code.

Aren't the last 2 results a waste of CPU time ?

Yes. And because the server code is such an old version, the server can't ask clients to abort the unnecessary WUs. :-(
ID: 19169 · Report as offensive     Reply Quote
glaesum

Send message
Joined: 16 Oct 06
Posts: 15
Credit: 144,247
RAC: 0
Message 19182 - Posted: 7 Mar 2008, 13:05:40 UTC - in response to Message 19168.  
Last modified: 7 Mar 2008, 13:07:03 UTC



2) This workunit is supposed to be crunching now in my computer.

Only thing is I have only 1 workunit on my computer, and it's not that one.
Why is this so ?

stuff happens:

it might be a similar symptom to the orphaned wu I got at the beginning of the week. My boinc mgr sent a request for work at 04.32utc Mar4, a wu was issued at 04.34utc but my log reported communication failed at 04.37 and timed out without receiving the the wu. I can't abort or do anything about it, it will just have to 'fail to report' and possibly be re-issued if needed.

/pg
ID: 19182 · Report as offensive     Reply Quote
PJ

Send message
Joined: 18 Sep 04
Posts: 8
Credit: 1,181,841
RAC: 0
Message 19183 - Posted: 7 Mar 2008, 16:44:30 UTC - in response to Message 19169.  

Hello !

2 questions for the same thread :

1) Is an initial replication of 5 really useful when you need a quorum of 3 ?

Rather than set short deadlines, this project tries to ensure fast turnarounds by using IR > Q. This decision has caused some controversy amongst crunchers, but the admins have said they will look at it again when they upgrade the server code.

Aren't the last 2 results a waste of CPU time ?

Yes. And because the server code is such an old version, the server can't ask clients to abort the unnecessary WUs. :-(


Otoh if a workunit is already in progress aborting it would be a waste, too. If the "faster" units did not error out the additional result is not really helpful, that I agree.

But if the "fast" units error out a "slow" workunit on a different architecture might still finish and give needed data.

So unless IR >> Q (say more than two surplus workunits) a little surplus of IR is making sense imv.
ID: 19183 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 19189 - Posted: 8 Mar 2008, 20:57:04 UTC - in response to Message 19183.  

Yes. And because the server code is such an old version, the server can't ask clients to abort the unnecessary WUs. :-(


Otoh if a workunit is already in progress aborting it would be a waste, too.

The abort mechanism only aborts ready-to-run workunits. Running workunits only get aborted if there is no way for them to get credit (ie. if they are WAY too late).

ID: 19189 · Report as offensive     Reply Quote
J Langley

Send message
Joined: 31 Dec 05
Posts: 68
Credit: 8,691
RAC: 0
Message 19195 - Posted: 10 Mar 2008, 12:28:04 UTC - in response to Message 19183.  

Otoh if a workunit is already in progress aborting it would be a waste, too. If the "faster" units did not error out the additional result is not really helpful, that I agree.

But if the "fast" units error out a "slow" workunit on a different architecture might still finish and give needed data.


Indeed. Those that dislike IR > Q would rather have shorter deadlines and have the project re-issue a WU if a client encounters an error or fails to report in time. However under certain circumstances IR > Q leads to Q more quickly than IR = Q with re-issues does.

This project is deadline-contrained rather than crunchtime-constrained, which is why the admins have it setup like this - they can afford to waste donated CPU cycles, but they can't afford to waste time.

If the flow of WUs ever oustrips the power of the attached crunchers (or they have WUs that are not time-critical), I'm sure they will look at this again.
ID: 19195 · Report as offensive     Reply Quote
J Langley

Send message
Joined: 31 Dec 05
Posts: 68
Credit: 8,691
RAC: 0
Message 19196 - Posted: 10 Mar 2008, 12:29:54 UTC - in response to Message 19189.  

The abort mechanism only aborts ready-to-run workunits. Running workunits only get aborted if there is no way for them to get credit (ie. if they are WAY too late).

Yes, it is a pity there isn't a client-side option to allow the abort of running WUs if Q has been achieved (or for any other reason the server decides). This would allow crunchers who don't care about credits not to waste CPU time. Perhaps BOINC 6 will bring this...
ID: 19196 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 19197 - Posted: 10 Mar 2008, 19:52:02 UTC - in response to Message 19196.  

The abort mechanism only aborts ready-to-run workunits. Running workunits only get aborted if there is no way for them to get credit (ie. if they are WAY too late).

Yes, it is a pity there isn't a client-side option to allow the abort of running WUs if Q has been achieved (or for any other reason the server decides). This would allow crunchers who don't care about credits not to waste CPU time. Perhaps BOINC 6 will bring this...


I was not certain enough yesterday to contradict what povaddict has said, and I'm still not certain enough today due to not tracking down what is tickling my brain into thinking what I'm thinking, but I thought that there was a capability within the server-side aborts to do the abort unconditionally. The conditional "abort the task if it is not running" was the "user-friendly" option... I could be mistaken about this though. IIRC, it was said over on the SETI message boards and I think it was either by John Mcleod VII, Josef Segur, or Ingleside...
ID: 19197 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 19200 - Posted: 10 Mar 2008, 21:50:19 UTC - in response to Message 19197.  

I was not certain enough yesterday to contradict what povaddict has said, and I'm still not certain enough today due to not tracking down what is tickling my brain into thinking what I'm thinking, but I thought that there was a capability within the server-side aborts to do the abort unconditionally. The conditional "abort the task if it is not running" was the "user-friendly" option... I could be mistaken about this though. IIRC, it was said over on the SETI message boards and I think it was either by John Mcleod VII, Josef Segur, or Ingleside...

Yes. If the server notices the client has a workunit that was completely aborted by the admin, or a workunit that has already expired (user didn't return it on time) and even got validated from other results, it will send an "abort now, no matter if it's running or not".

However, *users* can't choose if they want an "abort even if running" instead of "abort if not started" in the common case that a workunit reaches quorum normally.
ID: 19200 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 19201 - Posted: 11 Mar 2008, 4:06:32 UTC - in response to Message 19200.  

I was not certain enough yesterday to contradict what povaddict has said, and I'm still not certain enough today due to not tracking down what is tickling my brain into thinking what I'm thinking, but I thought that there was a capability within the server-side aborts to do the abort unconditionally. The conditional "abort the task if it is not running" was the "user-friendly" option... I could be mistaken about this though. IIRC, it was said over on the SETI message boards and I think it was either by John Mcleod VII, Josef Segur, or Ingleside...

Yes. If the server notices the client has a workunit that was completely aborted by the admin, or a workunit that has already expired (user didn't return it on time) and even got validated from other results, it will send an "abort now, no matter if it's running or not".

However, *users* can't choose if they want an "abort even if running" instead of "abort if not started" in the common case that a workunit reaches quorum normally.


Understood that... It's just that a project can choose to do unconditional aborts as well as the "polite" version... ;-)

Also, BOINC versions 5.8.16 and older do not support the server-side aborts...
ID: 19201 · Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 17 Sep 04
Posts: 41
Credit: 27,497
RAC: 0
Message 19217 - Posted: 13 Mar 2008, 1:26:19 UTC - in response to Message 19168.  

For more reading on the initial replication discussion go here http://lhcathome.cern.ch/lhcathome/forum_thread.php?id=2537

but don't expect admin to answer and don't expect a logical answer to the question.
ID: 19217 · Report as offensive     Reply Quote

Message boards : Number crunching : Initial replication and missing workunit


©2024 CERN