Message boards :
Number crunching :
EDF oddity
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
I thought I understood all the circumstances which put a box into EDF mode, but this one has got me beaten. Linux, client v 5.2.7 started from command line (inet.d script) and monitored from BOINCview on a windows system on the same LAN. Two projects, CPDN, 53% resource share, LHC 47%. No other projects. One CPDN WU, 33days run so far, 49days to go, deadline October 2006. Three LHC WU, each showing as between 9 and 10 hours to run befor ethey started, deadline 1 week from today. It refuses to come out of EDF mode. Why? There is plenty of time for both projects to finish all current work even on a single CPU, but it has 2 cpus. And this is the real annoyance - the box was loaded up with 6 days LHCwork before a 4 day net outage - and it has run LHC on both CPUs ever since and will run out of work on one CPU before the connection comes back. The only way to make it run one LHC and one CPDN is to suspend all but one of theLHC WU manually - but I don't really have time for such micromanagement. I tried editing the client state (with the client stopped of course) to give CPDN a day's worth of STD, (+86400) and LHC -86400, but of course this had no effect as it still thinks it is in EDF mode. I tried restarting BOINC, rebooting the machine, in case it is stuck in an old mode, and still no joy. Will this box ever run the CPDN result again without manual intervention? Any ideas gratefully received R~~ |
Send message Joined: 27 Aug 05 Posts: 50 Credit: 24,055 RAC: 0 |
May be this will help.... Work Scheduler EDF (earliest date first) is caused by: 1) A deadline within 24 hours. 2) A deadline within 2 * the connect time. 3) A failure of the Round Robin simulator to finish a result within 90% of its deadline. A project not requesting work is caused by: 1) A host that is in NWF (no work fetch) 2) A project that has enough work on a host that has enough work. 3) A project that has a LTD that is negative enough. NWF (no work fetch) is caused by: 1) A failure of the Round Robin simulator to get a result done within 90% of a deadline if the resource share of the next project to request work from is added to the Round Robin simulation. Work will always be requested from somewhere, even if that somewhere has a very negative LTD and/or the host is in NWF (no work fetch) if there is a CPU that is idle and there is a network connection. |
Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
LHC normally has 7 day deadlines. To avoid EDF mode your maximum queue length is 40% of the shortest deadline divided by the number of projects. Or 1.4 days with just LHC and CPDN. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 2 Sep 04 Posts: 165 Credit: 146,925 RAC: 0 |
There are a couple of possibilities: 1) You have a large cache (> 3.5 days in this case). This will keep the system in EDF as long as there is LHC work on the system. 2) The time stats indicate that you do not have as much CPU time as you think that you do. BOINC WIKI |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Thanks to all of you for those replies. Yes, it was bacause I'd turned up the cache size to fill up with work before the outage and not turned it back down while the box was disconnected from the Net. I'm glad it makes sense it had me totally befuddled at the time... R~~ |
©2024 CERN