Message boards : Number crunching : only 10 k WU's left
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 9157 - Posted: 4 Aug 2005, 7:17:58 UTC

Since they reset the tables early this week, I have seen neither the "No work" problen, nor the "There was work but it was allocated to other platforms" problem. In short, it seems to be running perfectly.

Viz that long term debt business, I have stated before, I do not like the way the BOINC core is developing and hence run 4.25 which does not suffer from any of these introduced problems/bugs.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 9157 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 9158 - Posted: 4 Aug 2005, 9:22:51 UTC - in response to Message 9150.  

> cache to 7 days and still have "normal" operation of boinc (since both LHC and
> SETI have "14" day deadlines). While running pred as well I could only safely


Hi Paul

are you sure you are getting a 14-day deadline? Is everybody else getting this also?


Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 9158 · Report as offensive     Reply Quote
Antjest

Send message
Joined: 30 Sep 04
Posts: 21
Credit: 1,442,034
RAC: 0
Message 9159 - Posted: 4 Aug 2005, 10:02:02 UTC - in response to Message 9158.  


I am still getting 14days deadline, but calculation estimate has returned to previous state. So basically, 10days cache is again cca 3 days worth.

You might consider changing to 7 days deadline and connect to max 5days if you want quick result turn-over, so queues will not be longer than 1 day and reduce of initially sent results to 3 and quorum of 2 as it was before.

Or any other combination you see fit for quick turn-around.

Tony

> are you sure you are getting a 14-day deadline? Is everybody else getting this
> also?
>
ID: 9159 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 9160 - Posted: 4 Aug 2005, 10:59:00 UTC - in response to Message 9158.  

> > cache to 7 days and still have "normal" operation of boinc (since both
> LHC and
> > SETI have "14" day deadlines). While running pred as well I could only
> safely
>
>
> Hi Paul
>
> are you sure you are getting a 14-day deadline? Is everybody else getting this
> also?
>
>
Erm..today is the August 4 and the deadlines are showing August 18...I think the difference is 14 days, actually 20,000 minutes ~ 13.89 days to be pedantic. I think this is just right.

Live long and crunch.

Paul.
ID: 9160 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 9161 - Posted: 4 Aug 2005, 11:18:22 UTC

Well, i had implemented some scripts that analyses the database and chooses the deadline that will give us the results as fast as possible. Unfortunately that number was being overridden before the work was submitted, i have hopefully fixed this bug and the new deadline should appear with the next bunch of jobs.

Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 9161 · Report as offensive     Reply Quote
Scottatron

Send message
Joined: 18 Sep 04
Posts: 28
Credit: 59,744
RAC: 0
Message 9176 - Posted: 4 Aug 2005, 22:31:59 UTC - in response to Message 9161.  

> Well, i had implemented some scripts that analyses the database and chooses
> the deadline that will give us the results as fast as possible. Unfortunately
> that number was being overridden before the work was submitted, i have
> hopefully fixed this bug and the new deadline should appear with the next
> bunch of jobs.
>
>

Chrulle,

What does that mean? Is the deadline remaining 14 days?
ID: 9176 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 2 Sep 04
Posts: 165
Credit: 146,925
RAC: 0
Message 9177 - Posted: 5 Aug 2005, 2:54:39 UTC

Changing the deadline to be very short will not get a higher through put of work. What will happen is that the client will note that you used the extra CPU time, and not download any work for a bit. OTOH if the problem is turnaround, then setting the deadline a bit shorter to force EDF may be what you need.


BOINC WIKI
ID: 9177 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 9187 - Posted: 5 Aug 2005, 23:14:11 UTC

Our problem is exactly turnaround. We cannot do post analysis until we get all the WU in a study back. This may take as much as 6 weeks with a 14 day deadline. The idea with the automatic deadline estimater is that it will lower the deadline as long as it will not cause too big a dent in the throughput.

Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 9187 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 9188 - Posted: 6 Aug 2005, 0:23:09 UTC
Last modified: 6 Aug 2005, 1:39:34 UTC

Currently at being attached (99)% or so to your project, I am set at a 10 day cache with BOINC 4.45 and get roughly 3 days of work(slightly improved since some adjustment on your part of run times.)I am pretty sure that a reduction in deadline would do ME no harm,however others have other circumstances and AGAIN it is currently RIDICULOUS that I set at 10 days and get 3. Please further solve this situation BEFORE you change things.(deadlines)[EDIT] Would like to add if you need more efficiency rather than jack with the 2 week deadline try sending out those results getting no returns more OFTEN than shortening the deadline. Have had a bunch in past stuck with like 3 unsent workunits for up to 5 DAYS![EDIT]
ID: 9188 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 9195 - Posted: 6 Aug 2005, 8:07:48 UTC - in response to Message 9188.  

> Please
> further solve this situation BEFORE you change things.(deadlines)[EDIT] Would
> like to add if you need more efficiency rather than jack with the 2 week
> deadline try sending out those results getting no returns more OFTEN than
> shortening the deadline. Have had a bunch in past stuck with like 3 unsent
> workunits for up to 5 DAYS![EDIT]


The deadline is in fact not a "deadline" as such, the deadline determines when the server will send out new results if it does not hear back from a machine.

Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 9195 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 9197 - Posted: 6 Aug 2005, 8:20:57 UTC

>>> We cannot do post analysis until we get all the WU in a study back.

As was discussed earlier in the thread, there are some people that collect a large number of wu's and store them locally processing them over several days, and there are people like me that take one, crunch it return it and get another one.

Can you wiggle the work allocator to send wu's at the start of a run to the hoarders, and the wu's that need fast turnround to complete a study to the fast turnround crunchers?

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 9197 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 9198 - Posted: 6 Aug 2005, 13:15:12 UTC - in response to Message 9188.  

I am set at a 10 day cache with BOINC 4.45 and get roughly 3 days of work(slightly improved since some adjustment on your part of run times.)I am pretty sure that a reduction in deadline would do ME no harm,however others have other circumstances and AGAIN it is currently RIDICULOUS that I set at 10 days and get 3. Please further solve this situation BEFORE you change things.(deadlines)

This is not a thing LHC has to fix. It's due to the way Boinc benchmarks your CPU.
If you use Boinc version 4.72 (alpha version, may be unstable in your use), it has an updated benchmark and is learning how long work units run. It's by long not perfect yet, but it guesses the estimated run time a lot better than CC4.45 does.

But if you would like to test it, you can download it here. Do know that you will need to crunch a couple of units before the estimates go down. It won't do it from the first one.
Jord

BOINC FAQ Service
ID: 9198 · Report as offensive     Reply Quote
Antjest

Send message
Joined: 30 Sep 04
Posts: 21
Credit: 1,442,034
RAC: 0
Message 9202 - Posted: 6 Aug 2005, 19:22:55 UTC

As I see it LHC is a project that needs fast turn around and not long local queues for people obsesed with credit after everybody else run dry.

So ideal cruncher is the one that downloads one WU, proces it, return it and then dowload another. OK, perhaps three at a time. There is no use that someone sends results back after 10 days and after 4 have procesed it. Some users have set connect to 10 days and do multiple projects.

So, I believe that server should be set in the way nobody gets more than 0,5 days at the time. It could be also set to initialy send to only three people and quorum of two is enough as identical results must be produced. With deadline like 5 days or even less, throughput will dramatically improve.

And for those who only think about credit. We are doing this the way program manager and science requires. If you don't like it, there are many other projects you can devote your CPU time.

Tony
ID: 9202 · Report as offensive     Reply Quote
Perle
Avatar

Send message
Joined: 25 Oct 04
Posts: 83
Credit: 77,867,554
RAC: 39,507
Message 9208 - Posted: 7 Aug 2005, 16:56:03 UTC
Last modified: 7 Aug 2005, 16:56:59 UTC

I realise this is prolly a flawed point of arguement and I did read most of the thread...but !

My resources are set at 50.25.25 / 3 days.
LHC / SETI / Einstein at a 3 day cache respectively.
Why must BOINC and these other ass-ociatied parameters be involved?
Why cant I just crunch at the straight forward 50.25.25 / 3 ???
It just seems it should just be that simple.
I understand people need flexibility to match their systems.




ID: 9208 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 9209 - Posted: 7 Aug 2005, 17:40:18 UTC - in response to Message 9208.  

It's because the group of crunchers who like to have their two week caches 'full' and crunching 'all the time' are more vocal than the casual cruncher.
So, over at other forums, they requested that they fill their caches, they get that cache filling feature, then another group complains that they have overdue work units, so they come up with 'panic mode' to ensure nearly expiring work units don't expire, then they come up with other parameters like 'debt' to control how the dynamic system works.

So, the development cycle is:
Alpha Test - Proof of concept, prototype works, find some bugs.
Beta Test - find and fix bugs.
First Release - find a whole bunch of bugs that the alpha/beta guys should have found.
Second release - kinda works.
Third release - Finally, it works. (people refer to this as Boinc 4.19)
4th release - new gui! new bugs.
5th release - gui fixed. some bugs fixed. new bugs introduced.
6th release - feature request granted... new bugs introduced.
.
.
.
20,000,000th release - aliens land, they see the source code, shake their heads and leave. Seti project declared a success. ;)



> I realise this is prolly a flawed point of arguement and I did read most of
> the thread...but !
>
> My resources are set at 50.25.25 / 3 days.
> LHC / SETI / Einstein at a 3 day cache respectively.
> Why must BOINC and these other ass-ociatied parameters be involved?
> Why cant I just crunch at the straight forward 50.25.25 / 3 ???
> It just seems it should just be that simple.
> I understand people need flexibility to match their systems.
>
>
>
>
>
I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 9209 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 9211 - Posted: 8 Aug 2005, 2:05:25 UTC - in response to Message 9187.  
Last modified: 8 Aug 2005, 2:40:39 UTC

> Our problem is exactly turnaround. We cannot do post analysis until we get all
> the WU in a study back. This may take as much as 6 weeks with a 14 day
> deadline. The idea with the automatic deadline estimater is that it will lower
> the deadline as long as it will not cause too big a dent in the throughput.
1st off Chrulle would like to say I'm sorry if my last post was seeming like jumping on you for "perceiving" you were changing 2 week deadline..... As I stated I believe your "Turnaround" is taking you tooooo long... dont think resends quicker would hurt more than 5% production. What Chrulle is ideal turnaround (realisticly not ideally) if 6 weeks seemingly too long?

ID: 9211 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 9213 - Posted: 8 Aug 2005, 3:01:51 UTC - in response to Message 9202.  

> As I see it LHC is a project that needs fast turn around and not long local
> queues for people obsesed with credit after everybody else run dry.
>
> So ideal cruncher is the one that downloads one WU, proces it, return it and
> then dowload another. OK, perhaps three at a time. There is no use that
> someone sends results back after 10 days and after 4 have procesed it. Some
> users have set connect to 10 days and do multiple projects.
>
> So, I believe that server should be set in the way nobody gets more than 0,5
> days at the time. It could be also set to initialy send to only three people
> and quorum of two is enough as identical results must be produced. With
> deadline like 5 days or even less, throughput will dramatically improve.
>
> And for those who only think about credit. We are doing this the way program
> manager and science requires. If you don't like it, there are many other
> projects you can devote your CPU time.
>
> Tony
Tony,
Would like to say in all due respect you are seemingly extremist in your view. 1st off if you care about the "science" did you ever think that maybe a 3 quorum also gives the scientists more accuracy ( or less sometimes) making them question the validity of certain results? Also the 2 results that may come in after the quorum might shed more light on a wu's problem. I keep a 3 day cache because I believe it is prudent given my situation. Using BOINC 4.72 is going to allow me to bring my cache close to 3 days and be able to allow more work to other projects(because times no longer skewed) I am not a hoarder as you would make out and my results I insist are just as valid and important to the LHC team as yours ( some of us don't also have the TIME to edit granted wu's out of our work that is a luxury even if I agreed with your philosiphy)
ID: 9213 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 9214 - Posted: 8 Aug 2005, 4:30:43 UTC
Last modified: 8 Aug 2005, 4:43:56 UTC

My laptop does not connect to a network and hence the internet for basically 3 days from 5pm Friday to 9am Monday. With 4.19 I could run LHC/SETI/Pred with the resource share I liked and the cache level I wanted and NEVER miss a deadline and ALWAYS return work within 4 days of receiving it. Since the move to the new style of wu scheduling I cannot run the projects in the same manner and keep enough work to last the weekend outage. I have therefore been forced to dump Predictor from this computer. With the variable nature of LHC wu completion times, I can now set my cache at 10 days and get about 4 days work. If you have a problem with this then check out my hosts (it's the P4 Mobile machine) and make an informed argument! This is not hoarding but judicous use of BOINC's capabilities.

The problem LHC is having is with the resend feature of BOINC when a host does not return a wu. The 2 week wait for the deadline to pass before the result is re-sent is causing delay's in getting the result's of a whole run to the physicists for analysis. The only way around this is to hard limit hosts with failures to fewer wu's (oh this is already happening, LHC just need to update the back end server version to get this working) or reduce the wu deadline from 14 days down to 10 or 7 days (please don't) or just make another result and resend IF there has been no quorum within 7 days (just please make sure the wu is not deleted until either all results are returned or there is a quorum and any outstanding results have passed their deadline).

BOINC V4.72 is starting to address the issues with wu completion times but due to extreme variability of LHC completion times the calculation is FUBAR'd for most of the time, but it is good start to resolving the wu completion time over-estimation problem.

Live long and crunch.

Paul.



ps. I hope this makes sense....I was working and trying to write this at the same time.
ID: 9214 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 9215 - Posted: 8 Aug 2005, 4:49:53 UTC - in response to Message 9214.  

> BOINC V4.72 is starting to address the issues with wu completion times but due
> to extreme variability of LHC completion times the calculation is FUBAR'd for
> most of the time, but it is good start to resolving the wu completion time
> over-estimation problem.
>
> Live long and crunch.
>
> Paul.
>
> src="http://www.evolvefish.com/fish/media/P-DarGold.gif">
>
> ps. I hope this makes sense....I was working and trying to write this at the
> same time.
No Paul,1st it makes sense and no if you crunch enough units it works better and better(# of units not time spent crunching) (and thank-you for your thoughtful resonse) 4.72 works perfect:) Since LHC my only production project and doing about 750 credits a day(total 750, this host about 270) 4.72 on an ht, machine (tough to adjust work que now with 2 out of 4 on 4.72)It is downloading more work than I want. (3 day) Give it time ...seems to be working here....have already adjusted down to a whole 9 days of work (for Colt) hope to refine it to 1 or 2 days in future(at least now I have hope)
ID: 9215 · Report as offensive     Reply Quote
Antjest

Send message
Joined: 30 Sep 04
Posts: 21
Credit: 1,442,034
RAC: 0
Message 9216 - Posted: 8 Aug 2005, 8:02:13 UTC - in response to Message 9213.  

> Would like to say in all due respect you are seemingly extremist in your
> view. 1st off if you care about the "science" did you ever think that maybe a
> 3 quorum also gives the scientists more accuracy ( or less sometimes) making
> them question the validity of certain results? Also the 2 results that may
> come in after the quorum might shed more light on a wu's problem.

Somebody had to go on opposite side of the majority on Boinc projects boards.
Alex explained very well why so many CC had to be made. Many features only for queing/deadline although I respect the one who are doing it. The only feature I find usefull in 4.19+ version are abort WU and no new work. Deadline problems can easily be adjusted with "connect to" (by user or by admins).
And in case of one project goes dark, there are others (purpose of Boinc).

In LHC only 2 results can be used for quorum, because results must be equal (not just close enough as in SETI) or don't validate. It was so at the begining and due to 0 credit problem it was changed to quorum 3 / send 5 .

>(tough to adjust work que now with 2 out of 4 on 4.72)It is downloading more >work than I want. (3 day) Give it time ...seems to be working here....have >already adjusted down to a whole 9 days of work (for Colt) hope to refine it >to 1 or 2 days in future(at least now I have hope)

This is exactly what is not wanted in LHC. At the moment looks like admins were trimming amount of work with estimated time. Now collectors can queue even more (not saying you are one of them). So admins must find another way to stop that.

At the end everybody are running Boinc projects by choice and admins must decide what's good for their project.

Tony
ID: 9216 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : only 10 k WU's left


©2024 CERN