1) Message boards : Number crunching : How long will the next batch last? (Message 14447)
Posted 26 Jul 2006 by Bill Hepburn
Post:
It might be interesting to know roughly how long the next batch of work units last. I have one machine that runs 24/7 and a couple more that run about 12 hours a day. I have missed the last couple of batches between "incremental backoffs".

Maybe a reply to this thread of "got one at" hh:mm. The first reply would give us the approximate start time.

Followed by "all out" at hh:mm would give us the approximate end time.
2) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13944)
Posted 11 Jun 2006 by Bill Hepburn
Post:


Immediately, the caches filled again upto normal values (and for the record "Connect to Network every X days" equals "Set local Cache size to X days worth of work" (which more or less will work out, depending on how accurate BOINC guesstimated the machine's performance)



I guess it depends on what you mean "more or less". It doesn't seem to work the other way though. when you initally get work, all seems to work as you describe. But, after a time, some of the work has been done... the local cache doesn't get replenished until the next scheduled connect. There has been lots of discussion on various message boards about the need of separating the two functions. I seem to remember a long time ago that there were two settings, but I could be wrong. In the scheme of things, I think it's not worth worrying about -- it works pretty well now.



Looks like I'll just wait until manually seeing LHC alive again, then temporarily switching LHC back on.



That's what I do. I do it mostly to keep from getting those messages "no new work... backing off for 1 minute" though.
3) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13942)
Posted 11 Jun 2006 by Bill Hepburn
Post:


You know what is fun? Go into the client_state.xml file and delete the negative signs from the debt values and then watch the scheduler freak out for a while as it tries to make all the values add up to zero again.

:)


I like that... Might be fun to enter some huge numbers too... Probably won't hurt anything that reformatting the hard drive won't cure ;)
4) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13934)
Posted 11 Jun 2006 by Bill Hepburn
Post:

Precisely. It seems it ought to continue to accumulate debt beyond that, but it doesn't.

One of life's mysteries methinks.

I'm not actually sure how JMVII wrote the scheduler but the LTD is what is used to determine if it should ask for work. With your switch time of 180 minutes it will continue to ask for work until the LTD reaches -10800. Then it decides that there is too much debt when it goes beyond that value and prevents further d/l until the debt again rises above -10800. Then it starts asking for work again etc etc
:)

That makes as much sense as anything. I'm amazed at how well it works on the whole. I'm attached to some combination of five projects, all with equal resource shares, and most of the time, it merrily rotates through them.
5) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13932)
Posted 11 Jun 2006 by Bill Hepburn
Post:

The description in the Wiki is confusing, at best. This is how I understand it. I could be wrong.

First, there really is no such thing as a cache setting. BOINC loads enough work to keep the machine busy until the next "Connect Interval". A long "connect interval" will, of course, mean more work is loaded at one time. There really is no "load x days work" setting.

Long term debt determines what program is queried for work. When Boinc needs to download (Download OK mode), it tries to get work from projects with a high LTD. It won't try to get work from projects with low LTD. Later, it decides that it must get work from somewhere because it is going to run out of work (Download required mode), it will get work from anyplace that has work, regardless of LTD.

So, I think in the case when LHC is out of work, it typically will have a relatively large LTD. As a result, BOINC won't download work until the scheduler goes into "Download Required".


There is no "cache setting" - got it.

What seems not to match my experience is:
1)The earlier posts about LHC being limited to 10000 LTD - mine was much higher.
Some of my other projects went negative.

2) your second to last paragraph seems to say that projects with a HIGH LTD will try to get work in Download OK Mode.
While your last paragraph seems to say that it won't go to "LARGE LTD" (is large different than high?) until "Download Required" mode...

So, If I have a High (?Large? LTD) - shouldn't that attract work?

Sorry to be so dense.

Phil


As I understand it, the high LTD on LHC would attract work, but (in most cases) LHC has no work to give. Since LHC can't comply, and the other projects have a low LTD, no work comes in until BOINC decides that it is about to run out of work and will take it from any place it can get it.

The LTDs periodically are recomputed, so they sum to zero -- some get pushed negative (which is a low debt). For some reason, LHC never seemed to accumulate the much larger numbers you mention and I have no idea of why.

I'm really not very good at explaining this whole thing, since my understanding is pretty fuzzy. It does sort of make sense in the perverse way of all things computer.

Cheers.
6) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13929)
Posted 11 Jun 2006 by Bill Hepburn
Post:
...handles Long Term debt, it actually and continually shrinks the effective Cache of all machines....


Can someone explain the rationale behind this Cache shrinkage due to long term debt?




The description in the Wiki is confusing, at best. This is how I understand it. I could be wrong.

First, there really is no such thing as a cache setting. BOINC loads enough work to keep the machine busy until the next "Connect Interval". A long "connect interval" will, of course, mean more work is loaded at one time. There really is no "load x days work" setting.

Long term debt determines what program is queried for work. When Boinc needs to download (Download OK mode), it tries to get work from projects with a high LTD. It won't try to get work from projects with low LTD. Later, it decides that it must get work from somewhere because it is going to run out of work (Download required mode), it will get work from anyplace that has work, regardless of LTD.

So, I think in the case when LHC is out of work, it typically will have a relatively large LTD. As a result, BOINC won't download work until the scheduler goes into "Download Required".
7) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13913)
Posted 10 Jun 2006 by Bill Hepburn
Post:
To get a LTD around 10000 you probably have a switch between applications time of 180 minutes


Precisely. It seems it ought to continue to accumulate debt beyond that, but it doesn't.

One of life's mysteries methinks.
8) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13911)
Posted 9 Jun 2006 by Bill Hepburn
Post:
Now, the Golden Question :
Does any newer Version of BOINC fix this annoying Problem ?
I don't see a changed behaviour with 5.4.9 and there is no newer development version available AFAIK. It seems to work as designed. Try to post your question to the BOINC board.

Norbert

(Edit: Corrected link)


Interesting. I have noticed that for me, LHC seems to only rack up about 10000 seconds of Long Term Debt when it runs out of work. That is, of course, enough that it only wants to get work from LHC unless it is going to run out of work. But, when LHC comes back it only runs LHC for a couple of hours, then starts cycling through the other projects.

The "exponential backoff" doesn't seem to work for me as advertised. It seems to only back off to about three hours, after the three hour back off runs out, it goes through a bunch of one minute back offs, then a bunch of two minutes, etc... I usually wind up suspending to avoid filling the logs with hundreds of messages telling me that LHC is out of work.

Nothing earth shattering, but enough to cause some moderate grumbling.
9) Message boards : Number crunching : I think we should restrict work units (Message 13756)
Posted 26 May 2006 by Bill Hepburn
Post:


Maybe a little less name-calling and a little more focus on the topic might be in order.



I must agree with that.

If the project percieves a problem, they can readily change the deadline, or the maximum number of units issued, or whatever. They haven't seen the need.

Boinc stats shows about 68k hosts attached to LHC. If the project issues 50k work units and sends each one out to 5 hosts, each hosts' "fair share" would be a bit less than 4.

If a user sets their preference to any value within the allowable range, that does not make them "a pig". It may indicate that they are unwise, but I don't think we can draw any further conclusions. They set their preferences to a value within the range that made sense to them.
10) Message boards : Number crunching : I think we should restrict work units (Message 13744)
Posted 25 May 2006 by Bill Hepburn
Post:


This is a bit of an aside to the original topic, but I hope people don't mind. I used to run only one project, so I'm trying to learn how to manage multiple projects effectively. If I give LHC a large resource share, it will get more CPU time. But will doing that cause LHC to eat up computer time even when there's no work available? I don't want projects with work to sit idle while LHC attempts to get non-existent work. Does resource share have an effect if you have no work to do? If not, increasing it now to prepare for next time should be fine and not change anything until LHC work units come in. I just want to check and make sure I understand how things work. Thanks for the help.


BOINC handles that situation pretty well, actually. When a project runs out of work, the scheduler effectively ignores it (it sets the "short term debt" to zero). When work becomes available, it starts paying attention again.

The BOINC Wiki has more information than you would ever want to know about the work scheduler.




©2022 CERN