Message boards : Number crunching : Quicker finishes?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile meckano
Avatar

Send message
Joined: 17 Sep 04
Posts: 150
Credit: 20,315
RAC: 0
Message 10679 - Posted: 10 Oct 2005, 13:47:27 UTC
Last modified: 10 Oct 2005, 13:49:07 UTC

Maybe at steps like this you can put more accepted results for the last few units?
Everyone gets credit, more chances of getting a fast computer working on them.

-----------------------
Click to see my tag
My tag
SNAFU'ed? Turn the Page! :D
ID: 10679 · Report as offensive     Reply Quote
alpina

Send message
Joined: 3 Aug 05
Posts: 49
Credit: 143,072
RAC: 0
Message 10683 - Posted: 10 Oct 2005, 22:30:38 UTC - in response to Message 10679.  

<blockquote>Maybe at steps like this you can put more accepted results for the last few units?
Everyone gets credit, more chances of getting a fast computer working on them.
</blockquote>

And more important, more chances that a host with a smaller cache size is working on them.


BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning
ID: 10683 · Report as offensive     Reply Quote
alpina

Send message
Joined: 3 Aug 05
Posts: 49
Credit: 143,072
RAC: 0
Message 10698 - Posted: 11 Oct 2005, 20:12:05 UTC
Last modified: 11 Oct 2005, 20:12:28 UTC

The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down.

Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished.


BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning
ID: 10698 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 10700 - Posted: 11 Oct 2005, 21:52:33 UTC - in response to Message 10698.  

<blockquote>The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down.

Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished. </blockquote>

Great idea!

That's precisely how the deadlines are set at LHC!


Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 10700 · Report as offensive     Reply Quote
alpina

Send message
Joined: 3 Aug 05
Posts: 49
Credit: 143,072
RAC: 0
Message 10708 - Posted: 12 Oct 2005, 14:36:49 UTC - in response to Message 10700.  

<blockquote><blockquote>The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down.

Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished. </blockquote>

Great idea!

That's precisely how the deadlines are set at LHC!

</blockquote>

But it doesn't seem to prevent some hosts from getting a massive number of workunits while the hosts with a small cache run out of work. It seems we have to live with this fact and blame it on the BOINC-system.

I guess you can't force the sheduler to give out work only to hosts with a small cache size at the end of a batch?


BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning
ID: 10708 · Report as offensive     Reply Quote
Profile Keck_Komputers

Send message
Joined: 1 Sep 04
Posts: 275
Credit: 2,652,452
RAC: 0
Message 10712 - Posted: 12 Oct 2005, 20:18:38 UTC

The project can do some things to reduce hording workunits, I think, it may cause other problems though.

Lower the number of workunits per RPC - I believe this is settable in the project config, with a default of 10 or 20.

Increase the delay between RPCs - I believe this is actually 2 settings, one how long the client will delay before an automatic RPC and second how long the server will refuse RPCs. They default to 1 min and 10 minutes. Those defaults don't make sense actually, the client setting should be set to 5 seconds longer than the server setting so a client doesn't get into a refusal loop.

The main problem with using these settings is finding a combination that addresses the problem without limiting hosts that would ordinarily process and return more work in a given timeframe.
BOINC WIKI

BOINCing since 2002/12/8
ID: 10712 · Report as offensive     Reply Quote
Profile meckano
Avatar

Send message
Joined: 17 Sep 04
Posts: 150
Credit: 20,315
RAC: 0
Message 10713 - Posted: 12 Oct 2005, 20:32:59 UTC - in response to Message 10712.  

I wasn't thinking of any standards changing.
It's just hard to sit and watch the number come down so slowly when I can be working on them too/also.

-----------------------
Click to see my tag
My tag
SNAFU'ed? Turn the Page! :D
ID: 10713 · Report as offensive     Reply Quote
Profile meckano
Avatar

Send message
Joined: 17 Sep 04
Posts: 150
Credit: 20,315
RAC: 0
Message 10721 - Posted: 13 Oct 2005, 13:14:04 UTC - in response to Message 10713.  

to add...
Large caches are nice when there are server problems, keeps x number of computers crunching while other 'connect every 0.1 days' computers like mine work on other projects.

-----------------------
Click to see my tag
My tag
SNAFU'ed? Turn the Page! :D
ID: 10721 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 1 Sep 04
Posts: 36
Credit: 78,199
RAC: 0
Message 10723 - Posted: 13 Oct 2005, 22:28:38 UTC - in response to Message 10712.  

<blockquote>Increase the delay between RPCs - I believe this is actually 2 settings, one how long the client will delay before an automatic RPC and second how long the server will refuse RPCs. They default to 1 min and 10 minutes. Those defaults don't make sense actually, the client setting should be set to 5 seconds longer than the server setting so a client doesn't get into a refusal loop.</blockquote>


The deferral-time now included in all RPC does add a little extra time to make sure a host doesn't get into a refusal loop... but of course for these things to work the projects must also upgrade their scheduling-server ocassionally...
ID: 10723 · Report as offensive     Reply Quote

Message boards : Number crunching : Quicker finishes?


©2025 CERN