Message boards : Number crunching : 100 WU/host limitation gone?
Message board moderation

To post messages, you must log in.

AuthorMessage
ric

Send message
Joined: 17 Sep 04
Posts: 190
Credit: 649,637
RAC: 0
Message 4069 - Posted: 22 Oct 2004, 5:34:20 UTC
Last modified: 22 Oct 2004, 5:35:21 UTC

Cool!

first sufficient work for all ;-)

and now, it looks like the limitation of 100 WUs per 24H per Host has gone or have been increased.

Now having several Hosts with more than 100 WUs queued.
Actual settings, 3.1 days

Click thumbnail image to enlarge the image for more details



Total 1'377



U2?
ID: 4069 · Report as offensive     Reply Quote
Profile sysfried

Send message
Joined: 27 Sep 04
Posts: 282
Credit: 1,415,417
RAC: 0
Message 4072 - Posted: 22 Oct 2004, 7:24:55 UTC - in response to Message 4069.  

That's not the case.

If you recieve 100 WU's on one day, and you machine crunches only 50 a day and you recieve another 100 the next day, you'll end up having more than 100 WU's in your queue.

Simple calculation backed by something i saw this morning on my main pc... (Daily quota exceeded)... And I have some workhorse at home ;-)

cheers,

sysfried

> Cool!
>
> first sufficient work for all ;-)
>
> and now, it looks like the limitation of 100 WUs per 24H per Host has gone or
> have been increased.
>
> Now having several Hosts with more than 100 WUs queued.
> Actual settings, 3.1 days
>
> Click thumbnail image to enlarge the image for more details
>
> <a> href="http://img56.exs.cx/my.php?loc=img56&image=ScreenShot249.jpg">
>
> Total 1'377
>
>
>
> U2?
>
ID: 4072 · Report as offensive     Reply Quote
ric

Send message
Joined: 17 Sep 04
Posts: 190
Credit: 649,637
RAC: 0
Message 4105 - Posted: 22 Oct 2004, 13:33:13 UTC - in response to Message 4069.  
Last modified: 22 Oct 2004, 13:34:14 UTC

TKX Sysfried.

sometimes the answer is so easy, so in front of us, we just don't see it..

Was probaly to exited, to see first time since running boinc, a client having
more than 100, even more than 200 Work Units to do.

Well not all are tunescans

So we still have this "problem" for to fast CPUs

ID: 4105 · Report as offensive     Reply Quote
Profile sysfried

Send message
Joined: 27 Sep 04
Posts: 282
Credit: 1,415,417
RAC: 0
Message 4110 - Posted: 22 Oct 2004, 14:37:36 UTC - in response to Message 4105.  

> TKX Sysfried.
>
> sometimes the answer is so easy, so in front of us, we just don't see it..
>
> Was probaly to exited, to see first time since running boinc, a client having
> more than 100, even more than 200 Work Units to do.
>
> Well not all are tunescans
>
> So we still have this "problem" for to fast CPUs
>
>
As long as there are enough long running WU's (tunescanA-E iirc), there's no problem at all. I have a dual opteron crunching without running out of WU's for more than 1 week now... and that was because of the long running WU's. Boinc expects them to take somewhat 9 hours, but they finish in 7 hours on my machine. And i'm getting 100 WU's a day in addition to that... so I'm at least 1 week away from running out of WU's.
And if anyone wants to complain, here's something to bitch about.
I have set my LHC settings to catch as mutch as 20 days of workload. Complain about it or don't. I WILL have them done in time! :-)
ID: 4110 · Report as offensive     Reply Quote
ric

Send message
Joined: 17 Sep 04
Posts: 190
Credit: 649,637
RAC: 0
Message 4117 - Posted: 22 Oct 2004, 17:02:08 UTC - in response to Message 4110.  
Last modified: 22 Oct 2004, 17:02:30 UTC

Complain?

Not seeing a "valid" reason for Complaining.
It's just normal to seek for the best "settings".
This project is relatively new. No we got a lot of work.

I reduced from 4.1 to 1.1 days (still a lot of work 2do, client are busy and busy)
Probaly the same situation, we have a mix of long running and fast running
WU's.
Only a question of interpretation if 20 Days * 100 max dl/host or all dl on the same day is done..

Much more in the foreground, I think, is the question is more, are you running into a deadline problem?

Q1
lets say, you will bring them down, but outside the timeframe given

Q2
The work is done. lets say this time, you are the first of 3(?) contributor
running the validation process. lets say, it was a long runner 8 Hours. Dated for the
oct 30th and its the 29th. will the time left (until reaching the dead time) enough to give them to 2 other contributors to work them with?
(I asume, the work process is sequentiel not parallel)

Q3
whats done with a returned work, coming lets say 3 weeks later than estimated?

friendly

ric

ID: 4117 · Report as offensive     Reply Quote
Profile sysfried

Send message
Joined: 27 Sep 04
Posts: 282
Credit: 1,415,417
RAC: 0
Message 4123 - Posted: 22 Oct 2004, 18:33:35 UTC - in response to Message 4117.  

> Complain?
>
> Not seeing a "valid" reason for Complaining.
> It's just normal to seek for the best "settings".
> This project is relatively new. No we got a lot of work.
>
> I reduced from 4.1 to 1.1 days (still a lot of work 2do, client are busy and
> busy)
> Probaly the same situation, we have a mix of long running and fast running
> WU's.
> Only a question of interpretation if 20 Days * 100 max dl/host or all dl on
> the same day is done..
>
> Much more in the foreground, I think, is the question is more, are you running
> into a deadline problem?
>
> Q1
> lets say, you will bring them down, but outside the timeframe given
>
> Q2
> The work is done. lets say this time, you are the first of 3(?) contributor
> running the validation process. lets say, it was a long runner 8 Hours. Dated
> for the
> oct 30th and its the 29th. will the time left (until reaching the dead time)
> enough to give them to 2 other contributors to work them with?
> (I asume, the work process is sequentiel not parallel)
>
> Q3
> whats done with a returned work, coming lets say 3 weeks later than
> estimated?
>
> friendly
>
> ric
>
>
>
Hard to answer each one of your questions without answering them all at once, so I'm gonna give you one answer.

My PC requests enough WU's so he has 20 Days of work. This amount of work requested depends on the Benchmark results. Since that one pc does a 9 hour unit in 7 hours, you could say that it takes only 7/9 times the real time it requests. And since it's a private machine, there's hardly anything going on on that machine but LHC stuff, so I will take 15.5 Days to do 20 days of work. And I think that is in time.

Any questions?
ID: 4123 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 1 Sep 04
Posts: 36
Credit: 78,199
RAC: 0
Message 4162 - Posted: 23 Oct 2004, 14:03:44 UTC - in response to Message 4123.  
Last modified: 23 Oct 2004, 14:07:51 UTC

>
> My PC requests enough WU's so he has 20 Days of work. This amount of work
> requested depends on the Benchmark results. Since that one pc does a 9 hour
> unit in 7 hours, you could say that it takes only 7/9 times the real time it
> requests. And since it's a private machine, there's hardly anything going on
> on that machine but LHC stuff, so I will take 15.5 Days to do 20 days of work.
> And I think that is in time.
>

LHC is having a 14-day-deadline, and is now using min_quorum = 3 and target_nresults = 4. This means needs 3 results before validation, but to speed-up things distributes to 4 users.
Note, then min_quorum is validated, the wu is not re-distributed regardless of target_nresults being reached or not.

If all wu is long, on day2 you've got a full 15.5-days-cache, meaning at day15 some wu will be returned after deadline and at day16 and later all results will be returned after deadline.

A fairly often wu-distribution, validation & crediting-process will go something like this:
day 1; result _0, _1, _2 & _3 is sent out to 4 different users. _3 is "mr. 20-day-cache"
day 2; result _1 errors out. New result _4 is made.
day 3; result _4 is sent out.
day 7; result _0 returned "success".
day 8; result _4 returned "success".
day10; result _2 returned "success".
3 "success"-results triggers validation.
_0, _2 & _4 passes validation.
One validated result chosen as "canonical result", let's say _0.
_0, _2 & _4 is granted the same credit.
Wu Assimilated.

day15; result _3 misses deadline.
Since wu Assimilated, and all results now accounted for, triggers File_deleter.
File_deleter removes wu & result-files from upload/download-directory.

day16; result _3 is returned.
Since _0 is deleted, _3 can never be validated.
No validation means no credit.

Remember, anyone returning after their deadline can only get credit if atleast another cruncher still haven't either returned his result or reached his deadline. Ok, it's also possible if some of the back-end-services isn't running or is backlogged, but personally I wouldn't gamble on this.

Then min_quorum = target_nresults like in seti, the wu is normally not validated then someone misses their deadline so a new result is made & sent out. Since this re-distribution can take a couple of days before a new user gets the wu & returns a result, the user missing his deadline often manages returning before the other user so everyone gets credit.

But with target_nresults > min_quorum like in LHC there's already atleast one "extra" result that really isn't needed for validation, meaning the wu very often is already validated then someone reaches their deadline...
ID: 4162 · Report as offensive     Reply Quote
ric

Send message
Joined: 17 Sep 04
Posts: 190
Credit: 649,637
RAC: 0
Message 4222 - Posted: 24 Oct 2004, 9:14:19 UTC - in response to Message 4162.  

This impressive example listing of the workflow gives a worthwhile aid.

A large queue can "prevent" to running out of work.

A higher queue will also say, the time between got the WU and returned the WU is also high to.

The "risk" to be to late for validations is high.

whomever the returned, propper calculated work can be used somewhere else.


Now there is the never ending balance to find, what are the best, individual settings.

If, and this is the big if, if 24h a day work would be available and accepted to returned, the best would perhaps, having "just" one work unit to process, one in queue.

(not possible due, other design, server maintenance,.. and ��)



Even when a work unit is returned after 2 days, in time, there CAN be given the
situation, retuned well, no credit granted?

Sad for many of us, or schools, still having PIII or AMD 1200/1400 GHz or slower, they use a long part of the day (and night) to crunch a long tuaregg, while other contributors with faster CPUs, have returned the work and earned the grants?

Due a lot of people HAVE a mix of old-old or old-new nodes, I hope this "disadvantage" is reduced.

Depending the work and server vailableness, I run mostly something beween 0.1 and 4.1, right now 1.8

There is also the scientist man/woman looking wishful to the distributed work, comming back late or never :-(

happy crunching for science and fun!

ric

ID: 4222 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 1 Sep 04
Posts: 36
Credit: 78,199
RAC: 0
Message 4226 - Posted: 24 Oct 2004, 13:05:23 UTC - in response to Message 4222.  
Last modified: 24 Oct 2004, 13:14:42 UTC

>
> The "risk" to be to late for validations is high.

There is no "risk" being too late if you returns before your 14-day-deadline.

>
> Even when a work unit is returned after 2 days, in time, there CAN be given
> the
> situation, retuned well, no credit granted?

Maybe my example wasn't clear enough, but BOINC must wait till all results is accounted for. This means, as long as 1 result still haven't been reported back and still haven't reached the deadline, BOINC will wait for this result and try to validate it if returned before the deadline.

If result _3 had instead been returned at day14, this would have triggered validation, and _3 been compared to _0 and credited if passed validation.

It's only for results returned AFTER their deadline you risk not getting any credit.

There's also other reasons that never will get any credit, either due to error or doesn't pass the validation. Also, some short results is wrongly reported as taking 0 seconds to crunch, so no credit either.

> Sad for many of us, or schools, still having PIII or AMD 1200/1400 GHz or
> slower, they use a long part of the day (and night) to crunch a long tuaregg,
> while other contributors with faster CPUs, have returned the work and earned
> the grants?

Just make sure your cache-setting is so low that all work is returned before the deadline, and you shouldn't have any problems. :)


ID: 4226 · Report as offensive     Reply Quote
Profile sysfried

Send message
Joined: 27 Sep 04
Posts: 282
Credit: 1,415,417
RAC: 0
Message 4245 - Posted: 24 Oct 2004, 16:40:23 UTC - in response to Message 4162.  

>
> LHC is having a 14-day-deadline, and is now using min_quorum = 3 and
> target_nresults = 4. This means needs 3 results before validation, but to
> speed-up things distributes to 4 users.

oops.. ok, my fault.. I thought it was somewhat 21 days...
i changed my settings right away... :-)

so long,

sysfried
ID: 4245 · Report as offensive     Reply Quote

Message boards : Number crunching : 100 WU/host limitation gone?


©2024 CERN