Message boards :
Number crunching :
Fairer distribuiton of work(Flame Fest 2007)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 1 Oct 04 Posts: 4 Credit: 391,126 RAC: 0 |
@Tomas Keck Komputers writes: "to send no more than 5 results per scheduler RPC" and "the host wait 10 minutes before getting more work" to slow down the download to give others the chance to download work. This would allow the PC to grab a lot of work - slower than in the moment - but possible. I would prefer that there are only a fixed number (say 5) of workunits (of this project) on a PC at a time. Before the PC loads another workunit it must upload a workunit. This is more restrictive but will not slow down activities of fast PCs. I hope you agree anyway :-) |
Send message Joined: 7 Nov 05 Posts: 19 Credit: 248,179 RAC: 0 |
I would prefer that there are only a fixed number (say 5) of workunits (of this project) on a PC at a time. Before the PC loads another workunit it must upload a workunit. This is more restrictive but will not slow down activities of fast PCs. Hmmm, I think I might agree if I could see a simple implementation, but one of the objections to River's first proposal was that it might require software re-writes, as I think yours would. John Keck's proposal merely requires 2 lines of the server-side config.xml to be changed, as I understand it. Given that this is a temporary situation (given Garield) and that John's solution would adeqately (perhaps not perfectly, but certainly adequately) fix the problem, improving moral whilst not damaging the response to science, I think that his has to be the way forward. :) |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
...but one of the objections to River's first proposal was that it might require software re-writes... Not quite. It was so unpopular that I nver got round to explaining how it would be done. To some extent it would provide the "more work when you give some back" function that people are now thinking of, so maybe the time is here to explain my original idea a little more. The idea was not to adjust the max quota at all, it was to adjust the actual quota of every host during the lull between work releases. To do this would entail writing a single line of SQL and executing it in MySQL. Something along the lines of UPDATE HOST_TABLE SET CURRENT_QUOTA = 5 WHERE CURRENT_QUOTA > 5 (I have made up the table & column names, but you get the idea) The update will take a few mins to run as it needs write access to the details of every host that is currently in the db. It might therefore be sensible to take the db offline for a few mins while the update runs. The update could be run by an admin from a MySQL session, or could be built into an admin-only page to be clicked on as appropriate. How it works. Next time work is available, each host only gets 5 tasks initially. When they send 1 result back, their quota doubles to 10. If that happens the same day as they got the work, they get another 5 tasks to make up their new quota for today. This extra quota will be available next time they contact the server. If it is still the same day as they got their first 5, they get another 5 to make up theior new quota; if they have crunched over midnight then they get the whole 10. The effect is that people get 5 WU each until the first boxes start to return work; and nobody gets more than 5 till they have crinched the first one or waited till tomorrow. There will be a small dead period for very fast boxes, when they crunch five, return five they will be immediately told "no quota" (as the doubling does not happen instantly). They wait one defer time (whoat is this defer time for "no work sent - quota exceeded?) but then next time in they get 5 or 10 WU. This dead time is, of course, taken up with useful crunching on other projects and is, in my opinion, an acceptable trade off for spreading the work across more boxes. This dead time will delay the crunching of the entire set of WU by far less than the current arrangment. It allows fast boxes to get a bigger share (as they may well come back into the pool while other boxes are getting their first 5), but excessively so, as they do have to have crunched a whole before they can come back for more. How much code needs to be written? To try the proposal out, only that one line of SQL is needed, nothing needs to be compiled as it can be run against the database in an interactive MySQL session without using C++ or PHP. To have a fixed set of code that runs on admin demand is just to embed the above line of SQL into a dedicated PHP page. To automate it with a fixed number 5 built in is to embed the update in a small program that runs ever 24hrs, tests to see if there are more than (say) 1000 results in progress, and if not performs the update. To automate the process to make it adapt intelligently to the number of incoming work units would be a more work and would need some way of asking the admin people how much work was coming -- I accept that on reflection and withdraw that part of the original suggestion. John's proposal has the advantage that it only has to be done once, and then can be left in place to be manually removed if we go back to a continuous work regime (with Garfield maybe). Mine has the advantage that because it has to be done manually each time (even if only by clicking on an admin page) it is even easier to remove when Garfield arrives. John's is also more likely to be acceptable to the admins who generally don't like updates that re-write every row of a table (I don't when I am db admin!). So I'd still put John's proposal as the front runner so far, but think mine deserves more credit than it got before, partly my fault in all fairness as I did not explain it in enough detail River~~ |
Send message Joined: 1 Oct 04 Posts: 4 Credit: 391,126 RAC: 0 |
When a PC can download a WU when it has uploaded one then there will be no "dead time" but with the restriction "maximum number of workunits per time" the work will be spread across more boxes because there will be the automatic delay of the running time of one WU ;-) |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
When a PC can download a WU when it has uploaded one then there will be no "dead time" but with the restriction "maximum number of workunits per time" the work will be spread across more boxes because there will be the automatic delay of the running time of one WU ;-) Yes, but the point that is being made is that download-only-after-upload invlolves a re-write of the secheduler code, and probably quite a major one at that. There would also be future work to keep this mod in future releases of BOINC, unless you could persuade BOINC to adopt the proposal (almost certainly not politically possible). Exploiting the existing quota system does not need a re-write if done as I suggest. The trade off of human effort saved vs dead time is, I suggest strongly balanced against the kind of re-write needed for you suggestion -- especially as dead time ought to be nearly irrelevant on machines that ought to have work from other projects anyway John's suggestion only involves a one off change to a config file and would involve dead time if the interval was extended, or would involve slow hosts coming back and getting more than one chunk of five if the interval was shortened. I agree that your idea gets closer to what is wanted than either John's or mine, but would suggest that the amount of code needed makes it an unlikley contender. Thanks for your input and your prompt engagement with my new posting. River~~ |
Send message Joined: 10 May 06 Posts: 8 Credit: 2,927 RAC: 0 |
I think LHC@home must find a way to spread the wu's a little fairer or there might come a time when they won't get their work done. It's not much fun to see that there are wu's and you don't get any for months. Maybe the crunccher who grab all they can should become a little less egoistic. Caliban |
Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
Strangly enough I actually like River's proposal better than mine now that I have seen it spelled out, at least from a participant's point of view. It has the advantage of each host gets 5 tasks the first day there is new work, if there is still work on the second day each host can get ~100 that day. So if there is a small batch of work more hosts will be able to get some. If there is a large batch most of it will still be sent out by the end of the second day. (Calling about 10,000 the break point.) I am also assuming it would be an automatic script that reset the quota to 5 whenever the amount of work on the server is low. However the objections River raised from an administrator's point of veiw are definatly valid and may rule this proposal out. There is also a hole in that a participant could detach and reattach and get the full quota right away. A freshly attached host would still be limited with the changes to the project's config.xml file that I mentioned. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 27 Mar 06 Posts: 21 Credit: 1,731 RAC: 0 |
It's maybe a good idea to check what the people at Nano-hive@Home have implemented recently. The behaviour I saw is that a host gets a few WUs initially and can only get more if completed WUs are uploaded, just like initially suggested in this thread. Just my 2 (Euro)cents. |
Send message Joined: 17 Sep 04 Posts: 19 Credit: 308,023 RAC: 0 |
I agree with WimTea... The NanoHive distribution of work was very good. Every time a wu was finished, I got another one, but no more than 3 at a time... Implementation of that shouldn't be so complicated and it would be better for us all... ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 17 Sep 04 Posts: 19 Credit: 308,023 RAC: 0 |
I agree with WimTea... The NanoHive distribution of work was very good. Every time a wu was finished, I got another one, but no more than 3 at a time... Implementation of that shouldn't be so complicated and it would be better for us all... ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 17 Sep 04 Posts: 19 Credit: 308,023 RAC: 0 |
I agree with WimTea... The NanoHive distribution of work was very good. Every time a wu was finished, I got another one, but no more than 3 at a time... Implementation of that shouldn't be so complicated and it would be better for us all... ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 17 Sep 04 Posts: 19 Credit: 308,023 RAC: 0 |
I agree with WimTea... The NanoHive distribution of work was very good. Every time a wu was finished, I got another one, but no more than 3 at a time... Implementation of that shouldn't be so complicated and it would be better for us all... ;-) Life is Science, and Science rules. To the universe and beyond Proud member of BOINC@Heidelberg My BOINC-Stats |
Send message Joined: 29 Sep 04 Posts: 7 Credit: 2,316,497 RAC: 0 |
I agree with WimTea... But the situation in LHC@Home is different, the time required to finish a WU varies a lot. Quite a lot of them get finished in a minute (or even few seconds) while some of them took >5 hours in my machine. If the system you suggest is implemented, the scheduler may receive too many requests when such short WUs are sending out. |
Send message Joined: 19 May 06 Posts: 20 Credit: 297,111 RAC: 0 |
I think LHC@home must find a way to spread the wu's a little fairer or there might come a time when they won't get their work done. It's not much fun to see that there are wu's and you don't get any for months. Maybe the crunccher who grab all they can should become a little less egoistic. Agreed. I'm here mainly to help with the science, but I also want to get a fair share of the work. It's been a little better recently than for the past few months, but I only got 2 work units today and 30000+ are still in progress. That hardly seems very fair to me. I'd really like to see the project implement a fairer system of distributing work units, but I doubt that they're willing to do anything about this, at least now. Perhaps an upcoming upgrade will take care of things. I'd really look forward to that. |
Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,733,409 RAC: 416 |
That is about fair. There are probably about 10,000 computers doing LHC work units right now. Going off of data from previous runs because no current stats are available. If there were 30,000 work units to do, that works out to 3 per computer. So you got shorted one. Better luck next time. - A member of The Knights Who Say NI! My BOINC stats site |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
That is about fair. There are probably about 10,000 computers doing LHC work units right now. Going off of data from previous runs because no current stats are available. If there were 30,000 work units to do, that works out to 3 per computer. So you got shorted one. Better luck next time. Agree totally with your main point, but would quibble on the figures. In July, the last time we had both BOINC stats and work that stayed on the server for long enough for most people to get some, there were between 6000 and 6999 active hosts - I clearly remember the number began with a 6. This time there were a total of more than 50,000 TASKS (only 10k WU, but each WU is sent five times). Scarecrow's graphs show a peak work in progress of 45,000, and well over 5,000 will already have been returned by the time that peak was reached. A more exact figure could be deduced from careful analysis of the data table included on his site. Taking 6k hosts and 50k tasks, the "fair" figure is around 8 or 9, if fair means each box gets the same. Some have asserted that a fair distribution must take account of machine speed - in which case an allocation of from 1 to 10 tasks each would be about right. My suggested quota alteration would have given everyone 5 tasks and left the remainder for grabs by the fast machines and large caches. It does not match either version of "fairness" but (from either viewpoint) comes closer than the current approach. R~~ |
Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 |
I would prefer that there are only a fixed number (say 5) of workunits (of this project) on a PC at a time. Before the PC loads another workunit it must upload a workunit. This is more restrictive but will not slow down activities of fast PCs. There is a discussion about adding this to the official server software on boinc_dev mailing list. I implemented it on my project and it really helped me get all workunits done faster. Don't theorize on if it will be better or not, implement it and send some work. (the fact that there are currently no admins makes this a bit hard...) |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
...I implemented it on my project and it really helped me get all workunits done faster... Your project PovAddict? - I looked back at all your posts here and can't see a mention of which project is yours. Can you post a link please. Or maybe better still, add a profile here that includes a mention of your project and whether your are sysadmin / owner / etc; and anything else you'd like us to know about you. The first thing I want to do when someone mentions something about their own experience is to read more about it in their profile. It is really good to have someone in this discussion who is running the server code hands on, whatever the project may be, so don't be shy about it! R~~ |
Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0 |
...I implemented it on my project and it really helped me get all workunits done faster... Renderfarm@Home. There was a discussion on boinc_dev about the way to implement the cache limit, you should look in there as well. |
Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
|
©2024 CERN