1) Message boards : Number crunching : Bugs - problems thread (Message 16989)
Posted 5 Jun 2007 by NJMHoffmann
Post:
While this isn't a huge deal it DOES mean that stats sites like mine will have to change their import scripts to find the XML files

IIRC the Boinc servercode stores the actual location of the stats (which may be variable for different servers) in a central xml-file that is stored at a fixed location. So the scripts should not use a fixed location for the files but the info the projects give.

Norbert
2) Message boards : Number crunching : New computer database entry created on each connect (Message 15556)
Posted 18 Nov 2006 by NJMHoffmann
Post:
Since there seems to be no post from someone _not_ experiencing the problem, I thought I'd point out that this not happening to everyone. It might be interesting to see if there are any patterns to the host duplication. . .
I think, the point is the hostid "0" stored at the client. There has been a combination of some newer version of the client with an old version of the server software, that could create such an entry. In the meantime the client was updated further and the server software too. "Uptodate" software will correct the entry - but both sides have to be updated. Now LHC seems to be the project with the oldest server version still around. The only other servers I've seen this error are Sztaki and Leiden which too have rather old software at the server.

Norbert
3) Message boards : Number crunching : Did everyone get work 02 Nov UTC? (Message 15391)
Posted 8 Nov 2006 by NJMHoffmann
Post:
C << 0.8 * D
Make that C < 0.8 * (D - (1day + switch_int)), because that's the time Boinc tries to send the result back.

Norbert
4) Message boards : Number crunching : Did everyone get work 02 Nov UTC? (Message 15386)
Posted 7 Nov 2006 by NJMHoffmann
Post:
Should the shares be recalculated to ignore that project, so that the other ptrojects get more work?
The projects are ignored as soon as they have no longer any WU on the client. Not quite right, but better than working with the original shares.

Norbert
5) Message boards : Number crunching : Did everyone get work 02 Nov UTC? (Message 15385)
Posted 7 Nov 2006 by NJMHoffmann
Post:
But presumably you mean it is between <(connect intvl) * (resource share)> and <ditto + 1 WU> ?
It's even a bit more complicated, because "on fraction", "run fraction" and "CPU efficiency" must be (and are) used.
As you can tell I am still confused by what the 5.4 clients are doing...
What confuses me more is, that I don't remember, if the 5.4 clients already had this download scheduler or if it was introduced some way into 5.5 :-)

Norbert
6) Message boards : Number crunching : Did everyone get work 02 Nov UTC? (Message 15382)
Posted 7 Nov 2006 by NJMHoffmann
Post:
It also means that John's advice (unusually for him) is actually rather misleading. Set the interval to the typical downtimes of your ISP and you will typically run out of work half way through an outage, having worked off half the cache before the ISP went down.

I think John is right, if you're "always on". The download scheduler downloads the moment the queue for a project falls below your "connect interval". So per project your queue is always between <connect interval> and <connect interval + 1 WU>.

Norbert
7) Message boards : Number crunching : Did everyone get work 02 Nov UTC? (Message 15371)
Posted 6 Nov 2006 by NJMHoffmann
Post:
River~~ wrote:
You give Rosetta 0.1 of the overall resources, and have a connect interval of 3 days. This means a full cache would take 30 days elapsed, ...

Perhaps you should mention, that this changed (will change) with Boinc versions > 5.4.x. The actual development scheduler will in this case keep a queue of .3 days of CPU time for Rosetta, which will happily be crunched in 3 days.

Norbert

PS: You'll have to rework all your formulas ;-)
8) Message boards : Number crunching : New computer database entry created on each connect (Message 14800)
Posted 21 Sep 2006 by NJMHoffmann
Post:
Ben about duplicate host entries:
We will look at this - seems to be one of our very last problems after a hectic 2 days..
IIRC the logic to recognize a host changed a bit in last Boinc versions. Newer server code (from 504??) should prevent this to happen with clients (from 5.4.x??).
9) Message boards : Number crunching : LHC@home server being reconfigured today !! (Message 14751)
Posted 20 Sep 2006 by NJMHoffmann
Post:
20/09/2006 17:49:06|lhcathome|Scheduler RPC succeeded [server version 502]
That's old. I think even the old LHC server had version 503 (that's old too).

Norbert
10) Message boards : Number crunching : Bye all! (Message 14603)
Posted 28 Aug 2006 by NJMHoffmann
Post:
FalconFly:
Having no 24/7 online connection and a mandatory Cache size of at least 1.5 days to cover my standard offline periods, having LHC attached without setting it to "Suspended" will quickly result in severe loss of CPU power (I've seen my entire Network run dry after as little as 30 Minutes, which normally holds some cumulative ~1000 hours CPU time worth of work to cover 1.5 days) and is overall just painful.

Do you test 5.5.16? I think you would be a valuable tester of the new work fetch code.

Norbert
11) Message boards : Number crunching : Bye all! (Message 14589)
Posted 24 Aug 2006 by NJMHoffmann
Post:
If I hadn't put LHC fully suspended, I'd have lost upto 75% of my computing power long ago, thanks to the associated Scheduler Bug.

What I did was: Reset the project or edit LTD to a small value. Set the ressource share to 1. LTD will now grow very slowly but the check for new work happens none the less.

Norbert
12) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13952)
Posted 11 Jun 2006 by NJMHoffmann
Post:
If you force it to get more work and cause boinc to go into EDF, yes it can go beyond the the values that I stated. But if you leave it alone then what i said is true.
The bug of the server side scheduler forces Boinc into EDF all by itself. (The bug is, that the clients asks for work in wall time, e.g "I need work for one day", and the server sends work in CPU time "I give you 24 hours of work [12 hours if the computer runs only half time for Boinc]" ignoring the ressource share of the other projects.)

Norbert
13) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13938)
Posted 11 Jun 2006 by NJMHoffmann
Post:
That makes as much sense as anything.
Because it definitely is wrong. My switch time is 120 mins and my LTD are bigger than +-7200. I think Steve mixed LTD and STD. LTD has no influence in how much work is asked for but only in the decision if work is asked for. And to keep track of "long term debt" it must keep bigger values.

Norbert
14) Message boards : Number crunching : Solution for LHC Long Term debt problem ? (Message 13906)
Posted 9 Jun 2006 by NJMHoffmann
Post:
Now, the Golden Question :
Does any newer Version of BOINC fix this annoying Problem ?
I don't see a changed behaviour with 5.4.9 and there is no newer development version available AFAIK. It seems to work as designed. Try to post your question to the BOINC board.

Norbert

(Edit: Corrected link)
15) Message boards : Number crunching : Work to be done! (Message 13854)
Posted 3 Jun 2006 by NJMHoffmann
Post:
Up to me, the first thing to do on Boinc would be to manage correctly workload.

Add me to the list :-)

Norbert
16) Message boards : Number crunching : Work to be done! (Message 13708)
Posted 23 May 2006 by NJMHoffmann
Post:
>But having set the cache to 2 days also means, that some other machines are downloading more than they need.
I quite don't understand that. Each Wus has a time-to-complete estimation and BOINC prevents from downloading more Wus than cache or that can be completed prior deadline. Familiar with "computer overcommited"?

That would be nice. But reality is, that (if you have multiple projects) the client asks for the buffer size of work (2 days) and the server forgets that there are other projects (and sends 2 days of work for each project).
Norbert



©2024 CERN