Quicker finishes?

Author	Message
meckano Send message Joined: 17 Sep 04 Posts: 150 Credit: 20,315 RAC: 0	Message 10679 - Posted: 10 Oct 2005, 13:47:27 UTC Last modified: 10 Oct 2005, 13:49:07 UTC Maybe at steps like this you can put more accepted results for the last few units? Everyone gets credit, more chances of getting a fast computer working on them. ----------------------- Click to see my tag My tag SNAFU'ed? Turn the Page! :D ID: 10679 · Reply Quote

alpina Send message Joined: 3 Aug 05 Posts: 49 Credit: 143,072 RAC: 0	Message 10683 - Posted: 10 Oct 2005, 22:30:38 UTC - in response to Message 10679. <blockquote>Maybe at steps like this you can put more accepted results for the last few units? Everyone gets credit, more chances of getting a fast computer working on them. </blockquote> And more important, more chances that a host with a smaller cache size is working on them. BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning ID: 10683 · Reply Quote

alpina Send message Joined: 3 Aug 05 Posts: 49 Credit: 143,072 RAC: 0	Message 10698 - Posted: 11 Oct 2005, 20:12:05 UTC Last modified: 11 Oct 2005, 20:12:28 UTC The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down. Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished. BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning ID: 10698 · Reply Quote

Gaspode the UnDressed Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0	Message 10700 - Posted: 11 Oct 2005, 21:52:33 UTC - in response to Message 10698. <blockquote>The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down. Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished. </blockquote> Great idea! That's precisely how the deadlines are set at LHC! Gaspode the UnDressed http://www.littlevale.co.uk ID: 10700 · Reply Quote

alpina Send message Joined: 3 Aug 05 Posts: 49 Credit: 143,072 RAC: 0	Message 10708 - Posted: 12 Oct 2005, 14:36:49 UTC - in response to Message 10700. <blockquote><blockquote>The problem is that there is a minority of hosts that use such a large cache size that these hosts are working on a rather small number of workunits for weeks while all the other hosts can't get any work. This slows the project down. Since I understand that the workflow comes in batches and the scientists have to examine these batches before they can sent out new work it is important to get in all the results of one batch as fast as possible. I think you can do this by using another type of deadline. Instead of using a fixed period of time in which the results have to be reported LHC could use a fixed deadline(a fixed date) before which the results should be reported. They could write a script that predicts how long it will take us to compute all the results of one batch depending on the recent flow of results, this script would then determine the deadline(this deadline can of course change a little bit over time depending on the throughput). In practice this would mean that hosts with a large cache can't get new work if they have a lot of work left and the current batch of workunits is almost finished. </blockquote> Great idea! That's precisely how the deadlines are set at LHC! </blockquote> But it doesn't seem to prevent some hosts from getting a massive number of workunits while the hosts with a small cache run out of work. It seems we have to live with this fact and blame it on the BOINC-system. I guess you can't force the sheduler to give out work only to hosts with a small cache size at the end of a batch? BOINC.BE: The team for Belgians and their friends who love the smell of glowing red cpu's in the morning ID: 10708 · Reply Quote

Keck_Komputers Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0	Message 10712 - Posted: 12 Oct 2005, 20:18:38 UTC The project can do some things to reduce hording workunits, I think, it may cause other problems though. Lower the number of workunits per RPC - I believe this is settable in the project config, with a default of 10 or 20. Increase the delay between RPCs - I believe this is actually 2 settings, one how long the client will delay before an automatic RPC and second how long the server will refuse RPCs. They default to 1 min and 10 minutes. Those defaults don't make sense actually, the client setting should be set to 5 seconds longer than the server setting so a client doesn't get into a refusal loop. The main problem with using these settings is finding a combination that addresses the problem without limiting hosts that would ordinarily process and return more work in a given timeframe. BOINC WIKI BOINCing since 2002/12/8 ID: 10712 · Reply Quote

meckano Send message Joined: 17 Sep 04 Posts: 150 Credit: 20,315 RAC: 0	Message 10713 - Posted: 12 Oct 2005, 20:32:59 UTC - in response to Message 10712. I wasn't thinking of any standards changing. It's just hard to sit and watch the number come down so slowly when I can be working on them too/also. ----------------------- Click to see my tag My tag SNAFU'ed? Turn the Page! :D ID: 10713 · Reply Quote

meckano Send message Joined: 17 Sep 04 Posts: 150 Credit: 20,315 RAC: 0	Message 10721 - Posted: 13 Oct 2005, 13:14:04 UTC - in response to Message 10713. to add... Large caches are nice when there are server problems, keeps x number of computers crunching while other 'connect every 0.1 days' computers like mine work on other projects. ----------------------- Click to see my tag My tag SNAFU'ed? Turn the Page! :D ID: 10721 · Reply Quote

Ingleside Send message Joined: 1 Sep 04 Posts: 36 Credit: 78,199 RAC: 0	Message 10723 - Posted: 13 Oct 2005, 22:28:38 UTC - in response to Message 10712. <blockquote>Increase the delay between RPCs - I believe this is actually 2 settings, one how long the client will delay before an automatic RPC and second how long the server will refuse RPCs. They default to 1 min and 10 minutes. Those defaults don't make sense actually, the client setting should be set to 5 seconds longer than the server setting so a client doesn't get into a refusal loop.</blockquote> The deferral-time now included in all RPC does add a little extra time to make sure a host doesn't get into a refusal loop... but of course for these things to work the projects must also upgrade their scheduling-server ocassionally... ID: 10723 · Reply Quote

LHC@home