Message boards :
Number crunching :
Daily quota
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
Eric, Having some kind of a quota is nice because it spreads the work around and prevents badly configured systems from chewing up a lot of iterations. Why not increase the daily quota to 120, per core if possible, but limit the number of tasks a machine can cache to 3 or 4 tasks per core. Also, you should decrease a machine's daily limit by 10 (down to a minimum of 1) every time it turns in a crashed task and increase it by 10 for every completed task up to a limit of 120. That will spread the work around without restricting those who can do lots of tasks per day. It will also put the brakes on machines that turn in nothing but crashed tasks. I'm pretty sure everything I've mentioned can be configured with existing server options though you might have to RTFM to find out what they're called and how to use them. Those options were not there before the recent server code upgrade. |
Send message Joined: 4 May 07 Posts: 250 Credit: 826,541 RAC: 0 |
Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems. It appears that Tasks are getting processed fairly quickly. Earlier today the Server indicated it had about 6,500 Tasks and now it is down under 4,000. The quota, as it is now, also lets us poor guys who don't have Crunching Farms get a "taste" of to work. |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems. The way it is now a computer can request work and receive 60 tasks if they've set a large cache. That's what I call sucking up the tasks. If you also impose a limit of no more than 3 tasks per core in the cache (the server can impose that limit) then nobody can download 60 at a time. Please read my other post, I'm suggesting a higher daily limit AND a 3 task per core in cache limit. If you don't understand that then you don't understand my proposal. See the problem with a small limit per day is that if they release a huge batch of tasks and everybody is limited to 60 per day then it takes longer to finish off the batch because machines that want more and can do more are sitting there with no tasks. That's not good. By having a high daily limit the machines that can do 120 per day are allowed to do so while everybody else is guaranteed work too because the 3 per core in cache rule insures that the big dogs don't get in first and download all the tasks to their cache. It appears that Tasks are getting processed fairly quickly. Earlier today the Server indicated it had about 6,500 Tasks and now it is down under 4,000. That number indicates how fast the tasks are getting downloaded. It says nothing about how fast they're getting processed. The rise and fall in the Tasks in Progress number tells you how fast tasks are being processed. The quota, as it is now, also lets us poor guys who don't have Crunching Farms get a "taste" of to work. With a bigger quota AND a new 3 tasks per core in cache rule, even the little guys will get their share and completion of the batch won't be slowed down. Think about it. With my proposal everybody wins. |
Send message Joined: 4 May 07 Posts: 250 Credit: 826,541 RAC: 0 |
Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems. I think you are making an assumption that everybody sets there with their systems running 24x7. From previous experience with LHC, most of the time Tasks get pushed to the Server during CERN working hours. Crunching Farms tend to suck these all up so by the time systems wake up here in the States there is never anything left. Yes, you can modulate the number of Tasks you receive by tweaking the cache size but I would be willing to bet that most have their cache size set to a point where they download enough Tasks so that they can keep running through the times when the Server has run dry (which could be days). I'm seriously hoping that LHC 1.0 will now get a lot more work to do so this isn't an issue. I've thought about it, and with my proposal it may take a little longer to get work crunched but everybody gets a taste of the work. If Tasks start to build up a backlog and it is taking more time than the CERN scientists can tolerate then I'm sure they will make the problem known to LHC staff and they can tweak the Quota again. |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
Yes, you can modulate the number of Tasks you receive by tweaking the cache size but I would be willing to bet that most have their cache size set to a point where they download enough Tasks so that they can keep running through the times when the Server has run dry (which could be days). No, you don't understand what I am proposing. Read carefully. The server can impose a limit on the number of tasks that a host can have in its cache. That limit applies even if the host's cache is set to the max. If the limit set by the server is 3 then that's the most a host can have in its cache even if the host requests 100 tasks. T4T uses a similar rule to limit hosts to 1 T4T task at a time even though hosts ask for more. I propose Sixtrack do the same except limit it to 3 tasks per core. Like I said earlier, such a rule would prevent hosts from grabbing 60 tasks all at once. You don't seem to understand how that works. I'm seriously hoping that LHC 1.0 will now get a lot more work to do so this isn't an issue. If tasks continue to take several hours to crunch then the quota won't be much of an issue. I've thought about it Well so far you haven't even understood what I propose so whatever you've thought doesn't apply. But that's OK because it's the admins that decide, not you, and I know they'll understand even if you can't understand. |
Send message Joined: 4 May 07 Posts: 250 Credit: 826,541 RAC: 0 |
Frankly, I don't understand how you can get 60 WU when the following change was made to the Server. --- Or am I missing something here? This change was made after I suggested that the number of Tasks be increased (from what appeared to be 3) because the Server was developing a significant backlog. |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
Most limits are per CPU. If you have 6 cores, multiply all numbers by 6. <max_wus_to_send> N </max_wus_to_send> Maximum jobs returned per scheduler RPC is N*(NCPUS + GM*NGPUS). <max_wus_in_progress> N </max_wus_in_progress> Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS There is a limit on how frequent RPC's can be, which could be used to prevent 6 requests in a row, forcing them to be spaced farther apart. <daily_result_quota> N </daily_result_quota> Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts. The point of the limits is to prevent faulty hosts from grabbing all the work, erroring it out and then forcing too many errors marking the work unit as too many errors and not sending more tasks for that work unit. Limits are also used so that everyone attached can get thier fair share of work. As for the quick runs, maybe there is a way they could grant another one to be sent every time one is validated with credit between 0.99 and 0.01. This would not be counted against the quota, but i'm not sure exactly how this could be done. Just a thought. See this page http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Schedulingoptionsandparameters and there are some more advanced methods they can use also. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well that seems like a solution to my problem (if i had read this I would have put it in my latest blog). We all agree hoarding is BAD. Who needs a quota if we have enough work. If you get a long job. Good. If you get a very short job well you just get more until finally you get a long one. We could maybe even up the credit for these very short jobs as they are actually useful to us. I'll discuss this with Igor. Thanks and Regards. Eric |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
Yes you were but I think Keith might have cleared it up. If not just ask. |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
Right now the limits are as follows: Please, monitor and see how this works. skype id: igor-zacharov |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
Right now the limits are as follows: RU Ограничение, равное 1 заданию в прогреÑÑе на Ñдро - Ñто тормоза! Ð”Ð»Ñ Ð»ÑŽÐ±Ð¾Ð³Ð¾ из Ð½Ð°Ñ - без вÑÑкого ÑомнениÑ, Ð´Ð»Ñ Ð¿Ñ€Ð¾ÐµÐºÑ‚Ð° в целом - Ñкорее вÑего тоже. Почему? 1. ПоÑле Ð·Ð°Ð²ÐµÑ€ÑˆÐµÐ½Ð¸Ñ Ð·Ð°Ð´Ð°Ð½Ð¸Ñ Ð½ÐµÐ¸Ð·Ð±ÐµÐ¶Ð½Ð° пауза Ð´Ð»Ñ Ð¾Ñ‚Ð¿Ñ€Ð°Ð²ÐºÐ¸ результата - в Ñтот момент оÑвободившееÑÑ Ñдро захватываетÑÑ Ð´Ñ€ÑƒÐ³Ð¸Ð¼ проектом. 2. Выполненное задание некоторое Ð²Ñ€ÐµÐ¼Ñ Ð½ÐµÐ¸Ð·Ð±ÐµÐ¶Ð½Ð¾ оÑтаётÑÑ Ð² ÑтатуÑе "ОжидаетÑÑ Ð¿Ð¾Ð´Ñ‚Ð²ÐµÑ€Ð¶Ð´ÐµÐ½Ð¸Ðµ". Ð’ завиÑимоÑти от верÑии BOINC такой ÑÑ‚Ð°Ñ‚ÑƒÑ Ð¼Ð¾Ð¶ÐµÑ‚ затÑнутьÑÑ Ð´Ð¾ Ñуток - вроде бы. 3. Когда новое задание наконец получено, раÑчёт не начнётÑÑ Ñразу (Ñдро уже захвачено другим проектом). ПроÑьба - увеличьте ограничение на макÑимум заданий в прогреÑÑе до 2 или Ñ…Ð¾Ñ‚Ñ Ð±Ñ‹ объÑÑните вашу мотивацию, почему вы прибегли к Ñтоль крайне жёÑткому ограничению. -Edit- Что бы Ñнизить потери времени до минимума, можно, конечно, пока еÑÑ‚ÑŒ работа у LHC@home, оÑтановить вÑе оÑтальные проекты. Ðо в таком Ñлучаи мы вынуждены паÑти Ñвои компики, так как в Ñлучаи "пуÑÑ‚Ñ‹Ñ…" (назовём так) заданий дневную квоту компик выÑоÑет за неÑколько минут и потом будет отдыхать до очередного вмешательÑтва пользователÑ, что абÑолютно не приемлемо. EN Limitation, equal to maximum 1 task in progress pro core, will slow down the progress for every of us (here is no doubt), and, maybe, for the whole project. Why? 1. After the task is completed BOINC needs some time to upload the result - in this moment the free core is captured by another project. 2. Completed task remains in the status "Ready to report" for some time too (depending on the version of BOINC that status may take up to 24 hours - not sure here). 3. When a new task is finally received, the processing does not start immediately (the core is working for another project). Please... Increase the limit on the maximum of tasks in progress to 2, or at least explain your motivation, why you have decided to use such extremely hard limitation. |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
If you crunch other projects then they will get the core eventually anyway. What difference does it make whether Sixtrack gets the core now or later? Your resource shares, not which project gets the core first, determines how much time each project gets. If the other project gets the core now or gets it later it all works out the same. 2. Completed task remains in the status "Ready to report" for some time too (depending on the version of BOINC that status may take up to 24 hours - not sure here). If BOINC thinks your computer should have another Sixtrack immediately after uploading a Sixtrack task then it will report the uploaded task immediately and ask for another task. 3. When a new task is finally received, the processing does not start immediately (the core is working for another project). True but even if the processing started immediately, the Sixtrack task would eventually be suspended to give time to one of your other projects. Get CPU now or get CPU later... it doesn't matter, it all works out in the end.. your projects get CPU time according to your resource shares. Having an additional Sixtrack task in your cache does not guarantee Sixtrack will immediately get more CPU time. That depends on the project debts. Please... This is not a hard limitation. It will not slow down this project. It will not affect how many Sixtrack tasks you run over the long term. That is determined by your resource shares. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
This is not a hard limitation. Which limitation is hard then? Zero tasks in progress? :-) It will not slow down this project. It will not affect how many Sixtrack tasks you run over the long term. That is determined by your resource shares. This time, Your conclusion is too fast and without enough arguments, sorry. Also, resource share is not a panacea. It newer worked properly and especially - it is working absolutely crazy in the newest BOINC versions (later than 6.2.19). I will try too explain this later (on examples). |
Send message Joined: 25 Jul 05 Posts: 19 Credit: 670,692 RAC: 0 |
On a Dual core CPU only running LHC@home 1.0 the machine wil recieve 2 WU's, after 1 will be completed the core will be unoccupied until a other WU has been download this will slow down crunching and will not keep both cores fully occupied. This is especialy the case with the work untis that only take a few seconds to complete. I would like to see the <max_wus_to_send> 2 </max_wus_to_send> changed to reflect 2 times the aviable cores of the machine(4 WU for dual core machines, 8 for a quadcore, etc), this way you have every core occupied and 1 WU per core avaialble for starting. |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
On a Dual core CPU only running LHC@home 1.0 the machine wil recieve 2 WU's, after 1 will be completed the core will be unoccupied until a other WU has been download this will slow down crunching and will not keep both cores fully occupied. This is especialy the case with the work untis that only take a few seconds to complete. You are forgetting that these numbers are per core, so 2 is 2 * N cores, so it already is set as you just asked. So it sends 2, 4, ... 16, 24, not just 2. This is also per RPC request, it will get more on the next request. From boinc documentation: <max_wus_to_send> N </max_wus_to_send> Maximum jobs returned per scheduler RPC is N*(NCPUS + GM*NGPUS) |
Send message Joined: 25 Jul 05 Posts: 19 Credit: 670,692 RAC: 0 |
i'm running 2 dual core machines at the moment and it's not possible to download WU's on's there are 2 WU on the machines, you wil recieve this notice form the server "23-9-2011 12:56:24 LHC@home 1.0 (reached limit of 2 tasks in progress)" instead of WU's. Ore i'm understanding you incorrectly. EDIT: Sorry my mistake i think it's the <max_wus_in_progress> 1 </max_wus_in_progress> that's my problem, once you have 1 WU in progress per core you wil not be able to download any WU's until 1 is finisht and uploaded, then the next scheduler request will gife you a new WU. The time to upload, have a scheduler reguest, download a WU is with me around 30 seconds, with some WU's only lasting a few seconds, this will not keep my machinne fully occupied. |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
i'm running 2 dual core machines at the moment and it's not possible to download WU's on's there are 2 WU on the machines, you wil recieve this notice form the server "23-9-2011 12:56:24 LHC@home 1.0 (reached limit of 2 tasks in progress)" instead of WU's. Yes, there is a misunderstanding here as that is based on this setting: <max_wus_in_progress> 1 </max_wus_in_progress> Yes, This may be a problem and was set too low. Igor needs to re-examin this choice. It will be hard to get 80*2=160 in a day with a limit of 2 sent at a time and running, waiting to be returned before 2 more can be received, of course if you get long running tasks it may not be such a big deal, the key is to finding a balance here. I think at least the previous 3 setting should be tried. That would give 1 running, 1 complete and 1 waiting per core and at lest a chance for boinc to refresh before the third is run, but maybe 5 is better so there is more waiting. ?? Remember also Igor, not all participants have instant on conenctions, some may still be on dial-up, this will cause excessive connections for that type of system. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
I think at least the previous 3 setting should be tried. +100! :-) If not 3, then at least 2. |
Send message Joined: 25 Jul 05 Posts: 19 Credit: 670,692 RAC: 0 |
Thanks we seem to be on the same level ;) Hording is also a bed thing so indeed where looking at finding a balance, 2 WU per core would be fine i think. With the longer WU's there is not much of a problem the short WU's keep the system waiting for work as things stand now. Other problem at my site is that the LAN connection around here is overloaded and some times is not responding for more then 30 seconds wich makes it fail the scheduler connection, upload or download what makes the system waiting for work even longer. |
Send message Joined: 17 Feb 07 Posts: 86 Credit: 968,855 RAC: 0 |
MilkyWay@home has a system with a minimum quota. If a WU is returned it will be validated and if ok, your daily quota will increase with one. So if you return a lot of validated (good) results your daily quota increases. I have now a DQ much much higher than my PC can run, so it will be never out of work, unless the projects has no jobs for a period of time. They don't give more than 12 WU's at once per graphic processor (in my case) so abuse is prohibited. Perhaps this is an idea to use here as well. A fair share for all participants of course is nice, but the main thing is that the science will be done, that is the reason of BOINC. And if that is done on crunching farms then is that ok, or not? Greetings from, TJ |
©2025 CERN