Message boards : Number crunching : Actual runtime far exceeds calculated runtime
Message board moderation

To post messages, you must log in.

AuthorMessage
Paul van Dijken

Send message
Joined: 31 Aug 12
Posts: 3
Credit: 836,904
RAC: 0
Message 25829 - Posted: 14 Sep 2013, 8:55:43 UTC

I run LHC WUs on 2 laptops (Vista and W7). On both, I see the LHC WUs using 2x more runtime then calculated. If the calculated runtime is 6 hours, it usually spends more than 14 hours.
This is on both laptops. None of the other (8) projects show this.

The problem is not the runtime itself, but they take so long, that some WUs do not start before their report date and running them is useless as they are now invalid.

How do I correct this?
ID: 25829 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 25831 - Posted: 14 Sep 2013, 18:33:26 UTC
Last modified: 14 Sep 2013, 18:40:39 UTC

Hi Paul,
The actual wallclock run-time will almost always be more than the CPU time. The estimate is also rarely going to be accurate as it depends on so many variables such as duration of previous work, so this bunch of longer than average WUs confuses Boinc a bit.
As to WUs not starting before their deadline, this can be fixed by reducing your work buffer to only allow Boinc to fetch as much work as you can safely crunch within deadline. I have mine set to 0.3 and 0.1 as I know my machine can finish that in a day or so. I don't like having days and days of work waiting.
ID: 25831 · Report as offensive     Reply Quote
Paul van Dijken

Send message
Joined: 31 Aug 12
Posts: 3
Credit: 836,904
RAC: 0
Message 25832 - Posted: 16 Sep 2013, 5:17:47 UTC - in response to Message 25831.  

I know wall clock time can exceed CPU time. I am talking about elapsed time. Why is this for LHC always wrong and not for the other 8 projects.

My buffer is set to 2 days
ID: 25832 · Report as offensive     Reply Quote
henry

Send message
Joined: 15 Sep 13
Posts: 73
Credit: 5,763
RAC: 0
Message 25833 - Posted: 16 Sep 2013, 10:29:04 UTC - in response to Message 25832.  

Every task your host receives carries with it information that allows your host to estimate elapsed time. Unfortunately, some algorithms are non-deterministic which means it is impossible to predict their elapsed time with any degree of accuracy. The Sixtrack algorithm in use at this project is such an algorithm and nothing can be done about it except to set a small cache.

Now the question is... how small should your cache be? Ultimately that is for you to decide but I suggest that if your host is missing deadlines then your cache is too big. So you need to ask yourself why you are keeping such a large cache, is it really necessary, and does keeping such a large cache make any sense if it causes your host to miss deadlines?

BOINC has a mechanism that enables it to learn to make better estimates of elapsed time. In time BOINC will realize it is underestimating elapsed time for this project and eventually the estimates will become more accurate. However, that takes time, perhaps a month or two. Until then you would be wise to reduce your cache. There are two cache settings. For the minimum setting I use .1 days and for the additional setting I use 0 days. My host is never out of work and it never misses a deadline. What more could anyone want?
ID: 25833 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 26098 - Posted: 5 Dec 2013, 13:30:52 UTC
Last modified: 5 Dec 2013, 13:41:44 UTC

Actually, I think what can affect people in part, is that their doesn't seem to be an update to the WU correction factor. For instance on the current batch, I notice that it estimates a 5 hour 44 minute and 45 second completion time. However, on my i7 quad core (3610QM) proc here, it takes more like 9 hours.

Now most projects tend to adjust the WU correction factor, so that as a task takes longer, the projected length of time is extended, and shortened as it shrinks. Not 1 single task has thus far taken less time, but when it gets reported, the number of estimated seconds on the tasks don't increase by even 1. I've watched it, just as it went to first upload a task, no adjustments made, least in core client 7.0.28 for LHC here.

Now in my case, not missing deadlines like the OP, but it means having to do some micro-management on task suspends and what not, should climate models download (also been rarely available), when run aside a third project, etc, as the estimates don't update. I think if they'd look at this one factor, so that the estimate is allowed to vary based upon previously returned work (that function doesn't seem very functional atm for this project's current app and version), that would remedy any such issues people might be seeing. Of course having a dual threaded quad core helps, but it also estimates feeding 8 cores, which if one runs POEM OpenCL, could also use one using 4 cores to a POEM app info file.

All in all, that's probably why other projects he isn't seeing an issue here, because those projects are likely updating the work duration factor, based upon last turn ins. I don't know what he's seeing, but I'm seeing no such update taking place with this project, meaning that no matter how many units one crunches, and how much longer then the estimate it takes, the estimate never seems to change, at all...

Unknown also is how many projects he might be running at once. Bad estimates could be an issue if he's trying to run all 8 projects at once (he mentioned 8, not sure how many he has active and not set to NNT while crunching here though). I imagine what he might have seen, if he's running them all on a comp?, and uncertainly unless he has one of these quad cores with 8 threads (the 6 core i7 Extreme I at least haven't seen in a laptop which he mentioned using), is that the thing commits itself, trying LHC and not getting work. It then gets for other projects. Then some work comes, but it already had work, even if his queue is relatively smallish, if it got work for 8 projects already, and is trying a 9th, and doesn't have many "cores" to crunch it on... But I really haven't checked what his comps are, and no idea how many projects he's running concurrently to say whether this could be an issue for him or not, for him. Some projects like primaboinca (not sure if he runs that), also seems prone to grab a lot of work, and keep itself running on all cores for awhile, and then balances itself with project debt over the long haul, vs running just a few tasks of itself alongside other projects, which could also add to something, if one's seeing an issue with bad estimates).
ID: 26098 · Report as offensive     Reply Quote

Message boards : Number crunching : Actual runtime far exceeds calculated runtime


©2024 CERN