1) Message boards : Number crunching : Something is wrong managed!!! (Message 20543)
Posted 26 Sep 2008 by Ingleside
Post:
If hosts would contact the server more often then more redundant tasks would get canceled. The project managers can direct hosts to contact the server more frequently but they are not doing so.


A question in regards to this:

Would setting a project to \"No New Tasks\" (or whatever it is called in newer versions) end up making it to where this would not work?

Reason I\'m asking is because I am pretty sure that if I do that (set to NNT) with 5.8.16, if I have a pending scheduler connect on a countdown, BOINC won\\\'t even attempt to connect when the countdown is over. It is either that, or it does the connect that time, but no more....

WCG is routinely connecting scheduling-server once every 4 days, as this is the setting they\'e using, even WCG is currently set to \"No new work\"... If set to \"suspended\" on the other hand, it should not connect, atleast not if you're running v5.10.xx... Not sure, since isn't connected just now, but BURP has been using a 1-hour delay before re-connect, so it should likely be a good project for testing-out this feature. ;)
2) Message boards : Number crunching : Garfield is now available (Message 20377)
Posted 15 Sep 2008 by Ingleside
Post:
Garfield seems to be now available.

I just wonder where is the LHC@home Alpha project. I want to join it.

The Garfield-application has been available for over a year... But, have there ever been any work for Garfield? I don\'t think so...

3) Message boards : Number crunching : Ghost result (server thinks I got it, but I didn't) (Message 18216)
Posted 16 Oct 2007 by Ingleside
Post:
Pretty sure the server will resend it regardless of your client state file. I'm pretty sure I've seen seti send me the same work units after doing a project reset on my machine. But I'm assuming LHC hasn't updated their code in so long that this option probably isn't available in their server. According to Neasan (or was it Alex?) they are going to update it within the next couple of weeks.

*fingers crossed*

v4.45 and later sends a list of all tasks in a project it currently has, and server-side compares if anything missing. Meaning, it doesn't matter if a task has never been on client before or not.

Server-side-support was added 28.07.2005, but for a long time only Einstein@home was using the re-issue-feature.

4) Message boards : Number crunching : my client will never connect (Message 18215)
Posted 16 Oct 2007 by Ingleside
Post:
What I'm trying to point out is that LHC does not seem to be communicating the required backoff time to the client. Maybe this is because doing so requires the server code upgrade they've been mentioning.

No, the bug was client-side, v5.10.14 introduced a bug that means every time you asks for work but don't get any, the deferral-message from project are not used.

If you did get work, or didn't ask for work (like only reporting), even LHC@home will show "Reason: requested by project".

5) Message boards : Number crunching : cant detach from project as is suggested by msg (Message 16979)
Posted 5 Jun 2007 by Ingleside
Post:
any other suggestions?

Any projects "attached" by an Account Manager should be detached by going to the AMS's website. If you're detaching in client, AMS will just re-attach you next time you connects to AMS. For this reason, detach is greyed-out in the client for these projects.

Anyway, it's still possible to manually delete the account_*.xml-file for a project and re-start BOINC-client.
6) Message boards : Number crunching : Past Due Date (Message 16104)
Posted 10 Jan 2007 by Ingleside
Post:
(I remember reading something about this, my apologies for not remembering where)

... like earlier in the thread... :)
7) Message boards : Number crunching : Past Due Date (Message 16031)
Posted 4 Jan 2007 by Ingleside
Post:

There is a plan to have BOINC server notify the client when a quorum is reached so it can abort the workunit.


wont that mean if your comp is to slow you wont get any credits if result isnt obtain before a quorom is reached ?

No, <result_abort_if_unstarted> will only abort results that haven't started yet, already started results will continue to the end, if user doesn't manually abort them. This is planned used on any wu that has already got a Canonical Result, meaning result won't be used for science, but already started results can still be credited if returned before deadline.

For cancelled wu, errored-out wu and so on, <result_abort> will be used, and this will immediately abort result regardless of started or not. Since result won't be used for science nor crediting, it's a waste of time to continue to run.


Needed client-side-support is included in v5.8.x, but server-side isn't implemented yet...
8) Message boards : Number crunching : New computer database entry created on each connect (Message 15683)
Posted 26 Nov 2006 by Ingleside
Post:
That thread also makes the useful point that even tho the BAM is not to blame, use of the BAM can make the situation worse if it promotes more frequent access between client and server.


Well... Account Managers does play an important part in this problem...

Only when needed, the Scheduler-reply includes hostid. This is either for newly-attached project, there <hostid>0</hostid>, or in case <rpc_seqno> in scheduler-request is lower than rpc_seqno saved in BOINC-database, indicating copied client-installation to another computer.

For LHC@home, a "normal" Scheduler-request including host-id looks something like this:

<project_preferences>
<resource_share>100</resource_share>
<project_specific>
<color_scheme>Tahiti Sunset</color_scheme>
</project_specific>
</project_preferences>
<hostid>1234567</hostid>

But, if someone has used an Account Manager to chance resource-share, the same Scheduler-request for LHC@home looks like this:

<project_preferences>
<resource_share>100</resource_share>
</project_preferences><hostid>1234567</hostid>

As you can see, there's 2 changes here. As someone probably has detected atleast in Rosetta@home, the <project_specific>-part has been "destroyed" by the Account Manager, but atleast for LHC@home this isn't really a problem.
The critical part is, hostid is placed on the same line as </project_preferences>, and client therefore does not parse this part at all. Meaning, if hostid was zero on client before, it will still be zero, and with old server-code, each time you're making a scheduler-request with <hostid>0</hostid>, you'll generate a new hostid...

With new server-code, Scheduling-server will also look on <host_cpid>, and if you've got a link between <host_cpid> and <hostid>, you'll re-use this hostid instead of generating a new one. Also, a quick check on SETI@home revealed <hostid> was correctly placed on another line, even in case Account Manager has "destroyed" the <project_specific>-part.


While waiting on LHC@home to upgrade the server-software, the easy solution for this problem is, if you goes to the projects own pages and changes the resource-share here, you're back to the "normal" scheduler-reply, and client won't have any problem to read any possible <hostid>.
9) Message boards : Number crunching : When you see work is around ... (Message 15279)
Posted 1 Nov 2006 by Ingleside
Post:
Hi Ingleside,

thanks for the useful injection of fact.

#1, time between successful send of work and the next, is clearly too low. At the smallest this should be about the time it takes to crunch a short task on a fast box, and on a project with sparse work this could afford to be longer (as all clients, fast and slow, should have other projects' work anyway if they are on LHC).

R~~


Well, with some work taking less than a minute, not sure how long the delay should be...

Looking on other projects, the 2 longest between RPC is WCG at 5 minutes and SETI at 10 minutes.
10) Message boards : Number crunching : Ghosts and Pending Credit (Message 15278)
Posted 1 Nov 2006 by Ingleside
Post:
If this setting is adopted, for how long does this work after the tasks were originally issued, and is the deadline the same as the original, or recalculated from the time of re-issue?

I can see problems either way. If the old deadline is kept, the client may have filled up with work from another project in the meantime. If the deadline is recalculated, some XXXX is going to figure out how to use the feature to extend the deadlines of their work.

Having said which, I'd still welcome this setting on this project.
R~~

Not quite sure on the deadline, but if not mistaken it's sometimes the old deadline, and sometimes a new deadline, but not neccessarily as long as normal deadline...

Re-issuing will only happen if you're asking for work, and if it looks like project has been reset or detached/re-attached will not re-issue any work.

Lastly, results that isn't needed any longer, due to wu already validated or errored-out, will not be re-issued.
11) Message boards : Number crunching : When you see work is around ... (Message 15268)
Posted 31 Oct 2006 by Ingleside
Post:
8 of my clients did get work that morning (one task / cpu) and each one went back to get work inside the same minute as it had already got the first WU. I had not see a client go back in under 1 minute before and had assumed the client would prevent this. It seems that when work is issued this server lets the client come back as soon as the file downloads are complete - these took around 40sec on each of my clients. (btw - we are talking about a *different* minute for each client!)


For any failed connection, uploads, downloads or scheduler-requests, BOINC-client uses a random backoff, this backoff is between 1 minute and 4 hours. Also, if there's been 10 failed Scheduler-requests in succession, client tries to re-download the web-page (master-URL), and if this fails you'll get a 7-days deferral in old clients, but atleast in v5.6.xx and later the deferral is only 24 hours.

For Successful Scheduler-replies on the other hand, there is some aditional rules:
1; If gets work, or didn't ask for work: <min_sendwork_interval> * 1.01, the 1.01 in case of user-clock runs a fractional faster than server-clock.
2; If asked but failed to get work: whatever is highest of <min_sendwork_interval> * 1.01 and clients random backoff.
3; Various server-side error-conditions: 1 hour.
4; Reached daily quota: midnight server-time + upto 1 hour randomly.
5; Don't meet system-requirements, like OS, min_boinc_client, usable disk-space or memory-requirement: 24 hours.
6; If expected run-time is so long, example due to low resource-share or cache-size higher than deadline, that can't finish 1 more result: can get upto 48 hours deferral, the size depends on cache-size and so on.


For LHC@home, #2 is randomly between 1 minute - 4 hours, and is the common behaviour then no work available.

#1 on the other hand is only 7 seconds, meaning if a client gets atleast 1 Task, but not enough to fill-up to cache-size, the same client can re-ask for more work 7 seconds later...
12) Message boards : Number crunching : Ghosts and Pending Credit (Message 15266)
Posted 31 Oct 2006 by Ingleside
Post:
Well, it won't stop the ocassional instances there scheduler-reply never makes it back to client, but a project can choose to re-issue any "lost" work, by changing their config-file, specifically by adding
<resend_lost_results/>

For this to work, users must also run BOINC-client v4.45 or later.
13) Message boards : Number crunching : Database Errors (Message 11976)
Posted 14 Jan 2006 by Ingleside
Post:
I did look at your hosts. :)

Unless you want your user account to be shorted 6,330.41 credits if the affected host were removed (because it can not be merged), then I wouldn't worry about it.


User-credit is separate from host-credit, so it's no problem to delete any old hosts there all results has been purged.
14) Message boards : Number crunching : new work eta? (Message 11436)
Posted 22 Nov 2005 by Ingleside
Post:
What I find odd is that the server does not check if a computer can handle all the wu's that have been send to it (I hade 10*32.00 credits of work assigned to me). A small feedback of the client telling the server it does not have any work would be nice so the server knows that the send wu's are never going to be completed.


v4.45 and later clients already reports-back all currently-cached results in each scheduler-request, but a project must themselves enable re-sending of any "missing" results, something it seems only Einstein@home has currently enabled.


Especially for a project like LHC@home that needs fast turnaround-times, not having to wait on any "ghost"-results would be an advantage...
15) Message boards : Number crunching : Host corruption solved? (Message 11058)
Posted 26 Oct 2005 by Ingleside
Post:
But this message:
26.10.2005 18:50:03|LHC@home|Deferring communication with project for 5 seconds
is in red!!! and appaers after each request.


It's only showing red if you're running an old client, not if you're running v5.2.x ;)
16) Message boards : Number crunching : Host corruption solved? (Message 11053)
Posted 26 Oct 2005 by Ingleside
Post:
26.10.2005 18:50:03|LHC@home|Deferring communication with project for 5 seconds

IMO:That is something unnessecary, or I'm wrong?


To guard against clients flooding scheduling-server with requests, there's for a long time been a project-specific limit between how often scheduling-server accepts connections.

A resent change is that the scheduling-server now always sends this limit in all replys, this stops clients from immediately asking again, but getting hit by a "too resent"-message.


Of course, with projects like LHC@home that uses 5?-second-limit it's mostly superfluous to be told to wait 7 seconds... but since it's an informal message meaning doesn't show up as red, it shouldn't be a problem. ;)


BTW, the deferral is always a little bit longer than the limit, to guard against computers with too-fast clock being stuck in an infinite loop of asking, "too resent" wait 1 minute, contacts after 59.99 seconds and "too resent".
17) Message boards : Number crunching : Host corruption (Message 10904)
Posted 25 Oct 2005 by Ingleside
Post:
If your host is getting corrupted write here what version of the boinc client you are using, and when you connected last. Are all your hosts getting corrupted?


Haven't seen any corruption yet, but the detailed computer-overview is missing one field:
"While BOINC running, % of time host has an Internet connection".

18) Message boards : Number crunching : Quicker finishes? (Message 10723)
Posted 13 Oct 2005 by Ingleside
Post:
<blockquote>Increase the delay between RPCs - I believe this is actually 2 settings, one how long the client will delay before an automatic RPC and second how long the server will refuse RPCs. They default to 1 min and 10 minutes. Those defaults don't make sense actually, the client setting should be set to 5 seconds longer than the server setting so a client doesn't get into a refusal loop.</blockquote>


The deferral-time now included in all RPC does add a little extra time to make sure a host doesn't get into a refusal loop... but of course for these things to work the projects must also upgrade their scheduling-server ocassionally...
19) Message boards : Number crunching : When Will LHC Server Be Updated to Handle BOINC 5.1.*? (Message 10660)
Posted 9 Oct 2005 by Ingleside
Post:
<blockquote>What about CPDN? Anyone know off hand what their intentions are/if they upgraded?

I'd go look myself but [insert lame excuse here].

</blockquote>

Well, when tested last week-end the only projects that worked was CPDN and SETI@Home. Einstein@home has later added the missing server-files so is now compatible, while Rosetta@home has upgraded to v5 and is now compatible.

While Predictor@home also has upgraded to v5, they're still missing some server-files so it's currently impossible to "attach" with v5-clients.

Most of the rest, like LHC@home, is running outdated scheduling-server, so doesn't want to talk with v5-clients at all.
20) Message boards : Number crunching : thread closed (Message 5199)
Posted 17 Nov 2004 by Ingleside
Post:
> Ingleside,
>
> Looks to me as though it checks for a Macintosh host and Darwin and SunOS
> OS's, too. I believe that would make 20 different platforms, not nine.
>

Haven't been following the LHC-forums lately, since 99% of the time thinks of checking LHC is having it's nightly shutdown, so late to answer.


LHC have only applications for windows & linux, so it will maybe make more platforms, but these platforms will not get any wu assigned to them so shouldn't slow down wu-distributing and crediting in any way.

Therefore, the distributed wu will at most be split into 9 different "homogenous platforms" under LHC.


Next 20


©2024 CERN