Message boards :
Number crunching :
Solution for LHC Long Term debt problem ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0 |
Since I have attached LHC, all machines periodically check for new work obviously. However, as LHC most the time reports "No Work from Project", my Clients continually build up tremendous amounts of Long Term debt. That alone wouldn't be a Problem, but the way BOINC V5.2.13 (didn't care to upgrade yet) handles Long Term debt, it actually and continually shrinks the effective Cache of all machines. In the long run, even with work from 2 other attached Projects and "Connect to Network" every 3 days, the actual cache eventually shrinks to a single WorkUnit (may be as little as 30 Minutes). That causes my entire Network to run dry in a matter of a few hours, no matter how high I set the Cache to compensate. Now, the Golden Question : Does any newer Version of BOINC fix this annoying Problem ? Right now, my only chance to restore halfway normal Cacheing again is to Suspend LHC on all machines and reset Project upto 10x in a row (in order to kill the hilarious Long Term debts). If there's no stable and long-sighted Solution to this Problem, I'll eventually have to abandon LHC (in favor of not losing upto 50% of my computing power caused by machines running empty of work) Scientific Network : 45000 MHz - 77824 MB - 1970 GB |
Send message Joined: 26 Nov 05 Posts: 16 Credit: 14,707 RAC: 0 |
Now, the Golden Question :I don't see a changed behaviour with 5.4.9 and there is no newer development version available AFAIK. It seems to work as designed. Try to post your question to the BOINC board. Norbert (Edit: Corrected link) |
Send message Joined: 29 Sep 04 Posts: 196 Credit: 207,040 RAC: 0 |
Does any newer Version of BOINC fix this annoying Problem ? Honestly, I don't know. The 5.4.x BOINC client updates have to do with network connectivity, security, screen saver and performance enhancements. My observations on 5.4.9: 1) I'm attached to 5 projects 2) LHC & Rosetta RS:10 3) Einstein, CPDN, and LHC-Alpha RS:1 4) Connect to network set at 0.1 days 5) There's always work for all projects who are issuing work in my tasks with respect to project settings, WU Deadlines, and LTD 6) When LHC issues WUs, BOINC works off the LTD as best it can with respect to project settings and WU deadlines 7) I haven't experienced the problem you're describing on this or prior versions of BOINC FWIW, you don't have to continually reset your projects to get the LTD to disappear. You can manually remove it if you choose. If you're running BOINC as a service try this: * Go to Start -> Run -> Type "net stop boinc" and click ok * Close the Boinc Manager if it's open * Find your BOINC folder in Program Files and then locate client_state.xml * Right-Click on it and choose Edit (It should open in notepad) * Look for LHC@Home's section and find the <long_term_debt>xxxx</long_term_debt> * Replace xxxx with 0.000000 * Save & Exit Notepad * Go to Start -> Run -> Type "net start boinc" *done* If you're running BOINC only while the user is logged in, Just Exit the Boinc manager and skip the net start/stop commands. |
Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0 |
Hm, I forgot to mention that the Network is connected only infrequently to the Internet; for as long as there's a Connection there's of course enough work but needs to be frequently downloaded. I need a cache that keeps the machines busy during offline periods upto approx. 30 hours (worst case). ( manually fixing the long term debt on 24 machines is unfortunately out of the question :/ ) Scientific Network : 45000 MHz - 77824 MB - 1970 GB |
Send message Joined: 28 Sep 04 Posts: 47 Credit: 6,394 RAC: 0 |
Or you could get Boinc Debt Viewer . It has a feature that allows your to reset your debts with the click of a button. But just like the other method you need to shut down Boinc before doing so. And if you forget it reminds you too. :) 98SE XP2500+ @ 2.1 GHz Boinc v5.8.8 |
Send message Joined: 18 Sep 04 Posts: 10 Credit: 5,151,492 RAC: 0 |
Now, the Golden Question :I don't see a changed behaviour with 5.4.9 and there is no newer development version available AFAIK. It seems to work as designed. Try to post your question to the BOINC board. Interesting. I have noticed that for me, LHC seems to only rack up about 10000 seconds of Long Term Debt when it runs out of work. That is, of course, enough that it only wants to get work from LHC unless it is going to run out of work. But, when LHC comes back it only runs LHC for a couple of hours, then starts cycling through the other projects. The "exponential backoff" doesn't seem to work for me as advertised. It seems to only back off to about three hours, after the three hour back off runs out, it goes through a bunch of one minute back offs, then a bunch of two minutes, etc... I usually wind up suspending to avoid filling the logs with hundreds of messages telling me that LHC is out of work. Nothing earth shattering, but enough to cause some moderate grumbling. |
Send message Joined: 28 Sep 04 Posts: 47 Credit: 6,394 RAC: 0 |
|
Send message Joined: 18 Sep 04 Posts: 10 Credit: 5,151,492 RAC: 0 |
To get a LTD around 10000 you probably have a switch between applications time of 180 minutes Precisely. It seems it ought to continue to accumulate debt beyond that, but it doesn't. One of life's mysteries methinks. |
Send message Joined: 29 Sep 04 Posts: 196 Credit: 207,040 RAC: 0 |
Or you could get Boinc Debt Viewer . It has a feature that allows your to reset your debts with the click of a button. That utility sure does simplify that process. It's kinda cool for those who need it. :) |
Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0 |
I'll also look into that. Right now, my LHC Long term debt is anywhere between 74000 and 467000 Seconds... Scientific Network : 45000 MHz - 77824 MB - 1970 GB |
Send message Joined: 13 Jul 05 Posts: 55 Credit: 41,230 RAC: 0 |
Or you could get Boinc Debt Viewer . It has a feature that allows your to reset your debts with the click of a button. It only works under XP. Do you want to get banned for 31 years, your account and credits deleted at a Boinc project ? Predictor@home is your best choice. |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
...handles Long Term debt, it actually and continually shrinks the effective Cache of all machines.... Can someone explain the rationale behind this Cache shrinkage due to long term debt? |
Send message Joined: 18 Sep 04 Posts: 10 Credit: 5,151,492 RAC: 0 |
...handles Long Term debt, it actually and continually shrinks the effective Cache of all machines.... The description in the Wiki is confusing, at best. This is how I understand it. I could be wrong. First, there really is no such thing as a cache setting. BOINC loads enough work to keep the machine busy until the next "Connect Interval". A long "connect interval" will, of course, mean more work is loaded at one time. There really is no "load x days work" setting. Long term debt determines what program is queried for work. When Boinc needs to download (Download OK mode), it tries to get work from projects with a high LTD. It won't try to get work from projects with low LTD. Later, it decides that it must get work from somewhere because it is going to run out of work (Download required mode), it will get work from anyplace that has work, regardless of LTD. So, I think in the case when LHC is out of work, it typically will have a relatively large LTD. As a result, BOINC won't download work until the scheduler goes into "Download Required". |
Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0 |
...handles Long Term debt, it actually and continually shrinks the effective Cache of all machines.... There is no "cache setting" - got it. What seems not to match my experience is: 1)The earlier posts about LHC being limited to 10000 LTD - mine was much higher. Some of my other projects went negative. 2) your second to last paragraph seems to say that projects with a HIGH LTD will try to get work in Download OK Mode. While your last paragraph seems to say that it won't go to "LARGE LTD" (is large different than high?) until "Download Required" mode... So, If I have a High (?Large? LTD) - shouldn't that attract work? Sorry to be so dense. Phil |
Send message Joined: 18 Sep 04 Posts: 10 Credit: 5,151,492 RAC: 0 |
As I understand it, the high LTD on LHC would attract work, but (in most cases) LHC has no work to give. Since LHC can't comply, and the other projects have a low LTD, no work comes in until BOINC decides that it is about to run out of work and will take it from any place it can get it. The LTDs periodically are recomputed, so they sum to zero -- some get pushed negative (which is a low debt). For some reason, LHC never seemed to accumulate the much larger numbers you mention and I have no idea of why. I'm really not very good at explaining this whole thing, since my understanding is pretty fuzzy. It does sort of make sense in the perverse way of all things computer. Cheers. |
Send message Joined: 28 Sep 04 Posts: 47 Credit: 6,394 RAC: 0 |
To get a LTD around 10000 you probably have a switch between applications time of 180 minutes I'm not actually sure of all the details of the scheduler that JM VII wrote but the LTD is what is used to determine if it should ask for work. With your switch time of 180 minutes it will continue to ask for work until the LTD reaches -10800. Then it decides that there is too much debt when it goes beyond that value and prevents further d/l until the debt again rises above -10800. Then it starts asking for work again etc etc. Also you will find that as the value gets closer to -10800 it will ask for less and less work each time until it exceeds the limit of -10800. Depending on the value of your switch time you can substitute the values below into the above statement. 60 min = 3600 sec 120 min = 7200 sec 180 min = 10800 sec 240 min = 14400 sec etc. :) 98SE XP2500+ @ 2.1 GHz Boinc v5.8.8 |
Send message Joined: 18 Sep 04 Posts: 10 Credit: 5,151,492 RAC: 0 |
That makes as much sense as anything. I'm amazed at how well it works on the whole. I'm attached to some combination of five projects, all with equal resource shares, and most of the time, it merrily rotates through them. |
Send message Joined: 28 Sep 04 Posts: 47 Credit: 6,394 RAC: 0 |
You know what is fun? Go into the client_state.xml file and delete the negative signs from the debt values and then watch the scheduler freak out for a while as it tries to make all the values add up to zero again. I was one of the testers for JM VII when he was first making the scheduler and I did a lot of testing of the scheduler. Was lots of fun trying to break it but I found it to be very robust. Even doing something like what I suggested above did not bother it too much. It would eventually get back to what it should be with all the values adding to zero. I know, geeks have a stange way of having fun , lol. :) 98SE XP2500+ @ 2.1 GHz Boinc v5.8.8 |
Send message Joined: 26 Nov 05 Posts: 16 Credit: 14,707 RAC: 0 |
That makes as much sense as anything.Because it definitely is wrong. My switch time is 120 mins and my LTD are bigger than +-7200. I think Steve mixed LTD and STD. LTD has no influence in how much work is asked for but only in the decision if work is asked for. And to keep track of "long term debt" it must keep bigger values. Norbert |
Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0 |
Since the BOINC Debt viewer is not really suitable for Networked operation, I went ahead and Suspended LHC + Reset Project around about 10 times. Immediately, the caches filled again upto normal values (and for the record "Connect to Network every X days" equals "Set local Cache size to X days worth of work" (which more or less will work out, depending on how accurate BOINC guesstimated the machine's performance) Looks like I'll just wait until manually seeing LHC alive again, then temporarily switching LHC back on. All in all, I assume this BOINC behaviour is an old but still untouched Scheduler Bug... Scientific Network : 45000 MHz - 77824 MB - 1970 GB |
©2024 CERN