Message boards : Number crunching : Scheduler ignoring LHC
Message board moderation

To post messages, you must log in.

AuthorMessage
themule

Send message
Joined: 11 Aug 05
Posts: 1
Credit: 4,777
RAC: 0
Message 11199 - Posted: 5 Nov 2005, 14:55:56 UTC

I am a member of five BOINC projects, all having equal weighting. My computer is always on this time of year and I have noticed that LHC is never scheduled for execution unless I intervene.

I usually try rebooting (XP), resetting the LHC ptoject in BOINC (5.2.6). What happens is if I turn my computer off for a while and I get close to the LHC WU deadline and start up again, round-robin scheduling is turned off and earliest-first starts. LHC then is scheduled and gets most of the CPU time. As soon as it reverts back to round-robin, it stops again.

Here is a recent example. Thing were going fine until a large work download of Predictor. Then, (summarized):


Computer becomes overcommitted:
11/4/2005 12:38:42 PM||Suspending work fetch because computer is overcommitted.
11/4/2005 12:38:42 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
11/4/2005 12:38:42 PM|LHC@home|Resuming result xxxx using sixtrack version 467
(after using 3 time slices in a row)
11/4/2005 2:54:04 PM|LHC@home|Computation for result xxxx finished
(one time slice for CPDN, then back to a new LHC)
11/4/2005 3:34:04 PM|LHC@home|Resuming result yyyy using sixtrack version 467
(Round robin resumes, LHC is paused)
11/4/2005 4:14:04 PM||Allowing work fetch again.
11/4/2005 4:14:04 PM||Resuming round-robin CPU scheduling.
11/4/2005 4:14:04 PM|LHC@home|Pausing result yyyy (left in memory)

LHC is never scheduled again. As a side note, I also do not get credit for my completed units.

Any thoughts?

TIA, John




ID: 11199 · Report as offensive     Reply Quote
Aurora Borealis

Send message
Joined: 18 Sep 04
Posts: 59
Credit: 317,857
RAC: 0
Message 11200 - Posted: 5 Nov 2005, 16:57:16 UTC - in response to Message 11199.  

I am a member of five BOINC projects, all having equal weighting. My computer is always on this time of year and I have noticed that LHC is never scheduled for execution unless I intervene.

I usually try rebooting (XP), resetting the LHC ptoject in BOINC (5.2.6). What happens is if I turn my computer off for a while and I get close to the LHC WU deadline and start up again, round-robin scheduling is turned off and earliest-first starts. LHC then is scheduled and gets most of the CPU time. As soon as it reverts back to round-robin, it stops again.

Here is a recent example. Thing were going fine until a large work download of Predictor. Then, (summarized):


Computer becomes overcommitted:
11/4/2005 12:38:42 PM||Suspending work fetch because computer is overcommitted.
11/4/2005 12:38:42 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
11/4/2005 12:38:42 PM|LHC@home|Resuming result xxxx using sixtrack version 467
(after using 3 time slices in a row)
11/4/2005 2:54:04 PM|LHC@home|Computation for result xxxx finished
(one time slice for CPDN, then back to a new LHC)
11/4/2005 3:34:04 PM|LHC@home|Resuming result yyyy using sixtrack version 467
(Round robin resumes, LHC is paused)
11/4/2005 4:14:04 PM||Allowing work fetch again.
11/4/2005 4:14:04 PM||Resuming round-robin CPU scheduling.
11/4/2005 4:14:04 PM|LHC@home|Pausing result yyyy (left in memory)

LHC is never scheduled again. As a side note, I also do not get credit for my completed units.

Any thoughts?

TIA, John




Stop playing around with it. Every project get its fair share eventually. The more you mess around with it the more the scheduler gets confused and need to rebalance. Boinc works properly on the long run. All your doing is creating debt to the other projects which has to be paid back before LHC will be given priority again.
Questions? Answers are in the BOINC Wiki.

Boinc V6.10.56 Recommended
WinXP C2D 2.1G 3GB
ID: 11200 · Report as offensive     Reply Quote
Aurora Borealis

Send message
Joined: 18 Sep 04
Posts: 59
Credit: 317,857
RAC: 0
Message 11201 - Posted: 5 Nov 2005, 17:12:06 UTC
Last modified: 5 Nov 2005, 17:18:21 UTC

Additinal note

Predictor and LHC have shorter reporting time. The net effect is that these project tend to force boinc into panic mode and crunching those projects first. This creates a debt that has to be repaid to the other projects and no new WU will be DL until this debt is taken care of.

The other projects than get an equal amount of time. It may not be obvious in the short term, and you probably wont see each take there turn in an orderly manner, but over a period of a week or two, it balances out.
Questions? Answers are in the BOINC Wiki.

Boinc V6.10.56 Recommended
WinXP C2D 2.1G 3GB
ID: 11201 · Report as offensive     Reply Quote
Travis DJ

Send message
Joined: 29 Sep 04
Posts: 196
Credit: 207,040
RAC: 0
Message 11202 - Posted: 5 Nov 2005, 18:11:08 UTC

Aurora made a good point. Heed his advice. If you *really* wanted to figure it all out, then do the math yourself and get the numbers straight from BOINC itself and you'll find out why it behaves that way. It will figure itself out eventually, there sometimes isn't instant gratification with the scheduler.

ID: 11202 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 11206 - Posted: 5 Nov 2005, 23:36:32 UTC - in response to Message 11199.  
Last modified: 5 Nov 2005, 23:37:34 UTC

... I have noticed that LHC is never scheduled for execution unless I intervene.


Your "intervention" is one of the reasons why LHC is not being scheduled. You are trying to force BOINC to break your own rules and BOINC is resisting because it's committed to following the rules. If you want LHC to have a bigger slice of your resources then simply change the rules and give it a bigger resource share. If your intention is to share your resources equally between 5 projects then please leave BOINC alone and it will do exactly that for you.

I usually try rebooting (XP), resetting the LHC ptoject in BOINC (5.2.6). What happens is if I turn my computer off for a while and I get close to the LHC WU deadline and start up again, round-robin scheduling is turned off and earliest-first starts. LHC then is scheduled and gets most of the CPU time. As soon as it reverts back to round-robin, it stops again.


What you are doing is crazy. Let's analyze it bit by bit.

1. Rebooting. BOINC has certain numbers. BOINC shuts down. BOINC restarts. BOINC has exactly the same numbers. Do you really expect BOINC to do anything different?

2. Resetting LHC. BOINC trashes all existing LHC work. BOINC gets new LHC work. Other projects are still "owed" so other projects will still run. All you have achieved is the trashing of perfectly good work. Resetting should be regarded as a "last resort" option when there is a "real" problem.

3. Leave computer off until close to LHC deadline. Boy, that's a good one :). A recipe for how to really screw things up!! You've just wasted perfectly good crunch time when LHC's "debt" could have been partly "repaid" to other projects and then forced BOINC to run LHC and accumulate even more debt with even longer to wait before LHC could run again. You need to go read about work scheduling and the concept of debt in the Wiki. You could start here.

4. ... back to round-robin, work stops. Well of course it does because LHC now has an even larger negative LTD than it did before. It's now going to be even longer before it is allowed to start again unless it has to go into panic mode again, which it probably will.

How do you solve this?? Well for starters, leave BOINC alone as the others have suggested. However, the best thing you can do is reduce your "connect to network" preference setting to 0.1 days or less for the moment and allow BOINC to drain all your excessive caches. BOINC will always stabilize more quickly if it's not trying all the time to deal with excessive work. As work completes, you can "update" particular projects to report the completed work if you wish but BOINC will handle everything on its own if you allow it. Once it has managed the crisis, BOINC will settle into normal round-robin scheduling according to your resource shares and that is the only place you should make changes if you wish to give a particular project more or less work. When things return to normal (which might take a week or two) you could then start gradually increasing your "connect to network" preference so as to start keeping a bit more work on hand. Firstly go from 0.1 to 0.3 for two days and see how that looks. Then try 0.6 for a few more days and see if BOINC is able to comfortably maintain round-robin scheduling without having to resort to EDF. If BOINC has to start invoking EDF then you have probably gone to far with your "connect to network" setting.

While you are waiting patiently for all the above to happen, start reading all the accumulated wisdom in the Wiki so you better understand what is going on.

Good luck!!



Things were going fine until a large work download of Predictor.


A sure sign that your "Connect to Network" interval is way too large.

Another point. You have ended up with two computer IDs for your machine, probably as a result of your "resetting". At some stage you should merge these using the function at the bottom of the old computer's page on the website.

Your newer computer ID shows 16 results in progress, another sign that your cache is too large. Do you have the same list of results under the work tab of BOINC Manager as is visible on the website?


Cheers,
Gary.
ID: 11206 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 11217 - Posted: 6 Nov 2005, 2:35:49 UTC - in response to Message 11201.  

Additinal note

Predictor and LHC have shorter reporting time. The net effect is that these project tend to force boinc into panic mode and crunching those projects first. This creates a debt that has to be repaid to the other projects and no new WU will be DL until this debt is taken care of.

The other projects than get an equal amount of time. It may not be obvious in the short term, and you probably wont see each take there turn in an orderly manner, but over a period of a week or two, it balances out.


To ensure BOINC does not enter EDF mode too soon you need to set your connect preference to about 80% of half of the shortest deadline project. So if a project has deadlines of 7 days then set your connect preference to 80% of 3.5 days = 2.8 days.

Live long and crunch.

Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 11217 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 11228 - Posted: 6 Nov 2005, 17:44:04 UTC

Everybody else gave good answers.

Additional Points.

Boinc, especially the latest version like 5.2.6, reports your completed work in two stages. When the workunit is completed by your host in will upload the result of that work to the project server. Your message log will show an entry like "Finished upload of .....". Then on the work tab there should be the workunit with status of "Ready to Report". The next time the client contacts that project it will report that the upload was done earlier, A message like "Reporting ## results" or "Requesting ### seconds of new work, and reporting ## results". You do not get credit until after they are reported and then only after the project validator has 3 similar results "the quorom". When you reset as Gary Roberts points out as his #2 answer but didn't fully say, All work and results on your host is trashed. This means that if they have been completed but not been uploaded yet or have been uploaded but not yet reported, you will not get credit, they are trashed.

The BOINC core client has two internal numbers called Short Term Debt (STD) and Long Term Debt (LTD), These are not visible in BOINC Manager but are visible to programs like BoincView. What these do is try to keep the projects running per your preferences, but when the client enters the Earliest Deadline First (EDF) mode for a project it will run up the LTD for that project which must be re-paid to the other projects. So for instance if LHC did its normal share plus 6 extra hours to complete, then the 6 extra hours must be also done by the other projects to catch up and even out the share. This is why LHC stops getting work and you got a large download of Predictor, because Predictor now has to play catchup, as the other projects run they will decrease the LTD LHC has accrued. When the number is back within normal range you will get work again. Projects can have either a debit or credit meaning that the project needs extra work or has done to much. BOINC tries on the LONG-RUN to maintain your preferences for sharing, On any day it may seem to be favoring one project or the other, but over a few days, a week or a month it should average out to what you specified.
ID: 11228 · Report as offensive     Reply Quote

Message boards : Number crunching : Scheduler ignoring LHC


©2024 CERN