Message boards : Number crunching : Daily quota
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23188 - Posted: 22 Sep 2011, 0:20:09 UTC - in response to Message 23186.  
Last modified: 22 Sep 2011, 0:21:17 UTC

Eric,

Having some kind of a quota is nice because it spreads the work around and prevents badly configured systems from chewing up a lot of iterations. Why not increase the daily quota to 120, per core if possible, but limit the number of tasks a machine can cache to 3 or 4 tasks per core. Also, you should decrease a machine's daily limit by 10 (down to a minimum of 1) every time it turns in a crashed task and increase it by 10 for every completed task up to a limit of 120. That will spread the work around without restricting those who can do lots of tasks per day. It will also put the brakes on machines that turn in nothing but crashed tasks.

I'm pretty sure everything I've mentioned can be configured with existing server options though you might have to RTFM to find out what they're called and how to use them. Those options were not there before the recent server code upgrade.
ID: 23188 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23189 - Posted: 22 Sep 2011, 4:21:24 UTC
Last modified: 22 Sep 2011, 4:21:55 UTC

Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems.

It appears that Tasks are getting processed fairly quickly. Earlier today the Server indicated it had about 6,500 Tasks and now it is down under 4,000.

The quota, as it is now, also lets us poor guys who don't have Crunching Farms get a "taste" of to work.
ID: 23189 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23190 - Posted: 22 Sep 2011, 5:23:23 UTC - in response to Message 23189.  
Last modified: 22 Sep 2011, 5:25:22 UTC

Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems.


The way it is now a computer can request work and receive 60 tasks if they've set a large cache. That's what I call sucking up the tasks. If you also impose a limit of no more than 3 tasks per core in the cache (the server can impose that limit) then nobody can download 60 at a time. Please read my other post, I'm suggesting a higher daily limit AND a 3 task per core in cache limit. If you don't understand that then you don't understand my proposal.

See the problem with a small limit per day is that if they release a huge batch of tasks and everybody is limited to 60 per day then it takes longer to finish off the batch because machines that want more and can do more are sitting there with no tasks. That's not good. By having a high daily limit the machines that can do 120 per day are allowed to do so while everybody else is guaranteed work too because the 3 per core in cache rule insures that the big dogs don't get in first and download all the tasks to their cache.

It appears that Tasks are getting processed fairly quickly. Earlier today the Server indicated it had about 6,500 Tasks and now it is down under 4,000.


That number indicates how fast the tasks are getting downloaded. It says nothing about how fast they're getting processed. The rise and fall in the Tasks in Progress number tells you how fast tasks are being processed.

The quota, as it is now, also lets us poor guys who don't have Crunching Farms get a "taste" of to work.


With a bigger quota AND a new 3 tasks per core in cache rule, even the little guys will get their share and completion of the batch won't be slowed down. Think about it. With my proposal everybody wins.
ID: 23190 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23191 - Posted: 22 Sep 2011, 6:49:40 UTC - in response to Message 23190.  

Frankly, I think the current Quota is probably a good number to keep working with until all the "undocumented features" get worked out. It pretty much guarantees that Tasks will get run on a variety of machines whereas, if you set a high quota, Tasks get sucked up by the Crunching Farms and may not see a variety of systems.


The way it is now a computer can request work and receive 60 tasks if they've set a large cache. That's what I call sucking up the tasks. If you also impose a limit of no more than 3 tasks per core in the cache (the server can impose that limit) then nobody can download 60 at a time. Please read my other post, I'm suggesting a higher daily limit AND a 3 task per core in cache limit. If you don't understand that then you don't understand my proposal.

See the problem with a small limit per day is that if they release a huge batch of tasks and everybody is limited to 60 per day then it takes longer to finish off the batch because machines that want more and can do more are sitting there with no tasks. That's not good. By having a high daily limit the machines that can do 120 per day are allowed to do so while everybody else is guaranteed work too because the 3 per core in cache rule insures that the big dogs don't get in first and download all the tasks to their cache.

It appears that Tasks are getting processed fairly quickly. Earlier today the Server indicated it had about 6,500 Tasks and now it is down under 4,000.


That number indicates how fast the tasks are getting downloaded. It says nothing about how fast they're getting processed. The rise and fall in the Tasks in Progress number tells you how fast tasks are being processed.

The quota, as it is now, also lets us poor guys who don't have Crunching Farms get a "taste" of to work.


With a bigger quota AND a new 3 tasks per core in cache rule, even the little guys will get their share and completion of the batch won't be slowed down. Think about it. With my proposal everybody wins.

I think you are making an assumption that everybody sets there with their systems running 24x7. From previous experience with LHC, most of the time Tasks get pushed to the Server during CERN working hours. Crunching Farms tend to suck these all up so by the time systems wake up here in the States there is never anything left. Yes, you can modulate the number of Tasks you receive by tweaking the cache size but I would be willing to bet that most have their cache size set to a point where they download enough Tasks so that they can keep running through the times when the Server has run dry (which could be days).

I'm seriously hoping that LHC 1.0 will now get a lot more work to do so this isn't an issue.

I've thought about it, and with my proposal it may take a little longer to get work crunched but everybody gets a taste of the work. If Tasks start to build up a backlog and it is taking more time than the CERN scientists can tolerate then I'm sure they will make the problem known to LHC staff and they can tweak the Quota again.
ID: 23191 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23192 - Posted: 22 Sep 2011, 7:16:49 UTC - in response to Message 23191.  

Yes, you can modulate the number of Tasks you receive by tweaking the cache size but I would be willing to bet that most have their cache size set to a point where they download enough Tasks so that they can keep running through the times when the Server has run dry (which could be days).


No, you don't understand what I am proposing. Read carefully. The server can impose a limit on the number of tasks that a host can have in its cache. That limit applies even if the host's cache is set to the max. If the limit set by the server is 3 then that's the most a host can have in its cache even if the host requests 100 tasks. T4T uses a similar rule to limit hosts to 1 T4T task at a time even though hosts ask for more. I propose Sixtrack do the same except limit it to 3 tasks per core. Like I said earlier, such a rule would prevent hosts from grabbing 60 tasks all at once. You don't seem to understand how that works.

I'm seriously hoping that LHC 1.0 will now get a lot more work to do so this isn't an issue.


If tasks continue to take several hours to crunch then the quota won't be much of an issue.

I've thought about it


Well so far you haven't even understood what I propose so whatever you've thought doesn't apply. But that's OK because it's the admins that decide, not you, and I know they'll understand even if you can't understand.
ID: 23192 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23196 - Posted: 22 Sep 2011, 17:47:25 UTC - in response to Message 23192.  



Well so far you haven't even understood what I propose so whatever you've thought doesn't apply. But that's OK because it's the admins that decide, not you, and I know they'll understand even if you can't understand.

Frankly, I don't understand how you can get 60 WU when the following change was made to the Server.

---
> <max_wus_to_send> 10 </max_wus_to_send> NEW
> <max_wus_in_progress> 10 </max_wus_in_progress> NEW


Or am I missing something here?

This change was made after I suggested that the number of Tasks be increased (from what appeared to be 3) because the Server was developing a significant backlog.
ID: 23196 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23197 - Posted: 22 Sep 2011, 18:16:19 UTC - in response to Message 23196.  



Well so far you haven't even understood what I propose so whatever you've thought doesn't apply. But that's OK because it's the admins that decide, not you, and I know they'll understand even if you can't understand.

Frankly, I don't understand how you can get 60 WU when the following change was made to the Server.

---
> <max_wus_to_send> 10 </max_wus_to_send> NEW
> <max_wus_in_progress> 10 </max_wus_in_progress> NEW


Or am I missing something here?

This change was made after I suggested that the number of Tasks be increased (from what appeared to be 3) because the Server was developing a significant backlog.

Most limits are per CPU. If you have 6 cores, multiply all numbers by 6.

<max_wus_to_send> N </max_wus_to_send>
Maximum jobs returned per scheduler RPC is N*(NCPUS + GM*NGPUS).

<max_wus_in_progress> N </max_wus_in_progress>
Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS

There is a limit on how frequent RPC's can be, which could be used to prevent 6 requests in a row, forcing them to be spaced farther apart.

<daily_result_quota> N </daily_result_quota>
Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts.

The point of the limits is to prevent faulty hosts from grabbing all the work, erroring it out and then forcing too many errors marking the work unit as too many errors and not sending more tasks for that work unit.

Limits are also used so that everyone attached can get thier fair share of work.

As for the quick runs, maybe there is a way they could grant another one to be sent every time one is validated with credit between 0.99 and 0.01. This would not be counted against the quota, but i'm not sure exactly how this could be done. Just a thought.


See this page http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Schedulingoptionsandparameters and there are some more advanced methods they can use also.
ID: 23197 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23199 - Posted: 22 Sep 2011, 20:00:09 UTC - in response to Message 23188.  

Well that seems like a solution to my problem (if
i had read this I would have put it in my latest blog).
We all agree hoarding is BAD. Who needs a quota if
we have enough work. If you get a long job. Good. If
you get a very short job well you just get more until
finally you get a long one. We could maybe even up the
credit for these very short jobs as they are actually useful to us.
I'll discuss this with Igor.
Thanks and Regards. Eric
ID: 23199 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23203 - Posted: 22 Sep 2011, 22:15:34 UTC - in response to Message 23196.  


Frankly, I don't understand how you can get 60 WU when the following change was made to the Server.

---
> <max_wus_to_send> 10 </max_wus_to_send> NEW
> <max_wus_in_progress> 10 </max_wus_in_progress> NEW


Or am I missing something here?


Yes you were but I think Keith might have cleared it up. If not just ask.
ID: 23203 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23205 - Posted: 23 Sep 2011, 2:54:03 UTC - in response to Message 23203.  

Right now the limits are as follows:

80
1
2
1

Please, monitor and see how this works.
skype id: igor-zacharov
ID: 23205 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,985,206
RAC: 43
Message 23207 - Posted: 23 Sep 2011, 6:38:12 UTC - in response to Message 23205.  
Last modified: 23 Sep 2011, 7:15:52 UTC

Right now the limits are as follows:

<max_wus_in_progress> 1 </max_wus_in_progress>

Please, monitor and see how this works.


RU
Ограничение, равное 1 заданию в прогрессе на ядро - это тормоза! Для любого из нас - без всякого сомнения, для проекта в целом - скорее всего тоже.
Почему?
1. После завершения задания неизбежна пауза для отправки результата - в этот момент освободившееся ядро захватывается другим проектом.
2. Выполненное задание некоторое время неизбежно остаётся в статусе "Ожидается подтверждение". В зависимости от версии BOINC такой статус может затянуться до суток - вроде бы.
3. Когда новое задание наконец получено, расчёт не начнётся сразу (ядро уже захвачено другим проектом).
Просьба - увеличьте ограничение на максимум заданий в прогрессе до 2 или хотя бы объясните вашу мотивацию, почему вы прибегли к столь крайне жёсткому ограничению.
-Edit-
Что бы снизить потери времени до минимума, можно, конечно, пока есть работа у LHC@home, остановить все остальные проекты. Но в таком случаи мы вынуждены пасти свои компики, так как в случаи "пустых" (назовём так) заданий дневную квоту компик высосет за несколько минут и потом будет отдыхать до очередного вмешательства пользователя, что абсолютно не приемлемо.
EN
Limitation, equal to maximum 1 task in progress pro core, will slow down the progress for every of us (here is no doubt), and, maybe, for the whole project.
Why?
1. After the task is completed BOINC needs some time to upload the result - in this moment the free core is captured by another project.
2. Completed task remains in the status "Ready to report" for some time too (depending on the version of BOINC that status may take up to 24 hours - not sure here).
3. When a new task is finally received, the processing does not start immediately (the core is working for another project).
Please...
Increase the limit on the maximum of tasks in progress to 2, or at least explain your motivation, why you have decided to use such extremely hard limitation.
ID: 23207 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23210 - Posted: 23 Sep 2011, 9:56:45 UTC - in response to Message 23207.  


Limitation, equal to maximum 1 task in progress pro core, will slow down the progress for every of us (here is no doubt), and, maybe, for the whole project.
Why?
1. After the task is completed BOINC needs some time to upload the result - in this moment the free core is captured by another project.


If you crunch other projects then they will get the core eventually anyway. What difference does it make whether Sixtrack gets the core now or later? Your resource shares, not which project gets the core first, determines how much time each project gets. If the other project gets the core now or gets it later it all works out the same.

2. Completed task remains in the status "Ready to report" for some time too (depending on the version of BOINC that status may take up to 24 hours - not sure here).


If BOINC thinks your computer should have another Sixtrack immediately after uploading a Sixtrack task then it will report the uploaded task immediately and ask for another task.

3. When a new task is finally received, the processing does not start immediately (the core is working for another project).


True but even if the processing started immediately, the Sixtrack task would eventually be suspended to give time to one of your other projects. Get CPU now or get CPU later... it doesn't matter, it all works out in the end.. your projects get CPU time according to your resource shares. Having an additional Sixtrack task in your cache does not guarantee Sixtrack will immediately get more CPU time. That depends on the project debts.

Please...
Increase the limit on the maximum of tasks in progress to 2, or at least explain your motivation, why you have decided to use such extremely hard limitation.


This is not a hard limitation. It will not slow down this project. It will not affect how many Sixtrack tasks you run over the long term. That is determined by your resource shares.

ID: 23210 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,985,206
RAC: 43
Message 23211 - Posted: 23 Sep 2011, 10:44:22 UTC - in response to Message 23210.  
Last modified: 23 Sep 2011, 10:46:51 UTC

This is not a hard limitation.

Which limitation is hard then? Zero tasks in progress? :-)

It will not slow down this project. It will not affect how many Sixtrack tasks you run over the long term. That is determined by your resource shares.

This time, Your conclusion is too fast and without enough arguments, sorry. Also, resource share is not a panacea. It newer worked properly and especially - it is working absolutely crazy in the newest BOINC versions (later than 6.2.19). I will try too explain this later (on examples).
ID: 23211 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23212 - Posted: 23 Sep 2011, 11:28:32 UTC

On a Dual core CPU only running LHC@home 1.0 the machine wil recieve 2 WU's, after 1 will be completed the core will be unoccupied until a other WU has been download this will slow down crunching and will not keep both cores fully occupied. This is especialy the case with the work untis that only take a few seconds to complete.
I would like to see the <max_wus_to_send> 2 </max_wus_to_send> changed to reflect 2 times the aviable cores of the machine(4 WU for dual core machines, 8 for a quadcore, etc), this way you have every core occupied and 1 WU per core avaialble for starting.

ID: 23212 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23213 - Posted: 23 Sep 2011, 11:48:53 UTC - in response to Message 23212.  
Last modified: 23 Sep 2011, 11:52:43 UTC

On a Dual core CPU only running LHC@home 1.0 the machine wil recieve 2 WU's, after 1 will be completed the core will be unoccupied until a other WU has been download this will slow down crunching and will not keep both cores fully occupied. This is especialy the case with the work untis that only take a few seconds to complete.
I would like to see the <max_wus_to_send> 2 </max_wus_to_send> changed to reflect 2 times the aviable cores of the machine(4 WU for dual core machines, 8 for a quadcore, etc), this way you have every core occupied and 1 WU per core avaialble for starting.

You are forgetting that these numbers are per core, so 2 is 2 * N cores, so it already is set as you just asked. So it sends 2, 4, ... 16, 24, not just 2. This is also per RPC request, it will get more on the next request.

From boinc documentation:
<max_wus_to_send> N </max_wus_to_send>
Maximum jobs returned per scheduler RPC is N*(NCPUS + GM*NGPUS)
ID: 23213 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23214 - Posted: 23 Sep 2011, 11:55:13 UTC - in response to Message 23213.  
Last modified: 23 Sep 2011, 12:09:39 UTC

i'm running 2 dual core machines at the moment and it's not possible to download WU's on's there are 2 WU on the machines, you wil recieve this notice form the server "23-9-2011 12:56:24 LHC@home 1.0 (reached limit of 2 tasks in progress)" instead of WU's.
Ore i'm understanding you incorrectly.

EDIT:
Sorry my mistake i think it's the <max_wus_in_progress> 1 </max_wus_in_progress>
that's my problem, once you have 1 WU in progress per core you wil not be able to download any WU's until 1 is finisht and uploaded, then the next scheduler request will gife you a new WU. The time to upload, have a scheduler reguest, download a WU is with me around 30 seconds, with some WU's only lasting a few seconds, this will not keep my machinne fully occupied.
ID: 23214 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23215 - Posted: 23 Sep 2011, 12:03:30 UTC - in response to Message 23214.  

i'm running 2 dual core machines at the moment and it's not possible to download WU's on's there are 2 WU on the machines, you wil recieve this notice form the server "23-9-2011 12:56:24 LHC@home 1.0 (reached limit of 2 tasks in progress)" instead of WU's.
Ore i'm understanding you incorrectly.

Yes, there is a misunderstanding here as that is based on this setting:
<max_wus_in_progress> 1 </max_wus_in_progress>

Yes, This may be a problem and was set too low. Igor needs to re-examin this choice. It will be hard to get 80*2=160 in a day with a limit of 2 sent at a time and running, waiting to be returned before 2 more can be received, of course if you get long running tasks it may not be such a big deal, the key is to finding a balance here.

I think at least the previous 3 setting should be tried. That would give 1 running, 1 complete and 1 waiting per core and at lest a chance for boinc to refresh before the third is run, but maybe 5 is better so there is more waiting. ??


Remember also Igor, not all participants have instant on conenctions, some may still be on dial-up, this will cause excessive connections for that type of system.
ID: 23215 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,985,206
RAC: 43
Message 23216 - Posted: 23 Sep 2011, 12:18:26 UTC - in response to Message 23215.  

I think at least the previous 3 setting should be tried.

+100! :-) If not 3, then at least 2.
ID: 23216 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23217 - Posted: 23 Sep 2011, 12:19:44 UTC - in response to Message 23215.  
Last modified: 23 Sep 2011, 12:20:50 UTC

Thanks we seem to be on the same level ;) Hording is also a bed thing so indeed where looking at finding a balance, 2 WU per core would be fine i think.
With the longer WU's there is not much of a problem the short WU's keep the system waiting for work as things stand now.
Other problem at my site is that the LAN connection around here is overloaded and some times is not responding for more then 30 seconds wich makes it fail the scheduler connection, upload or download what makes the system waiting for work even longer.
ID: 23217 · Report as offensive     Reply Quote
T.J.

Send message
Joined: 17 Feb 07
Posts: 86
Credit: 968,855
RAC: 0
Message 23219 - Posted: 23 Sep 2011, 12:44:50 UTC
Last modified: 23 Sep 2011, 12:46:44 UTC

MilkyWay@home has a system with a minimum quota. If a WU is returned it will be validated and if ok, your daily quota will increase with one. So if you return a lot of validated (good) results your daily quota increases. I have now a DQ much much higher than my PC can run, so it will be never out of work, unless the projects has no jobs for a period of time. They don't give more than 12 WU's at once per graphic processor (in my case) so abuse is prohibited.
Perhaps this is an idea to use here as well. A fair share for all participants of course is nice, but the main thing is that the science will be done, that is the reason of BOINC. And if that is done on crunching farms then is that ok, or not?
Greetings from,
TJ
ID: 23219 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Daily quota


©2024 CERN