Message boards : Number crunching : New version of Boinc
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8093 - Posted: 16 Jun 2005, 4:52:42 UTC

The fix for this is to break the link between queue size and connect interval. This was proposed by John McLeod VII when he was making the new Work Scheduler. To this point Dr. Anderson was unsure that this was needed, but did ask for input on the question. I don't know if the proposal has suceeded or not ...

To be honest, I actually think we have more problems than that ... especially with very long work units. But the new 4.45 seems to run pretty well for me. Of course, I have my setting at 3.0 days right now. I don't know that it needs to be higher with having all 5 production projects assigned on all my machines ...

Anyway, check is in the mail ... :)
ID: 8093 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8096 - Posted: 16 Jun 2005, 17:03:02 UTC - in response to Message 8093.  
Last modified: 16 Jun 2005, 17:05:04 UTC

> The fix for this is to break the link between queue size and connect interval.
> This was proposed by John McLeod VII when he was making the new Work
> Scheduler. To this point Dr. Anderson was unsure that this was needed, but did
> ask for input on the question. I don't know if the proposal has suceeded or
> not ...
>
> To be honest, I actually think we have more problems than that ... especially
> with very long work units. But the new 4.45 seems to run pretty well for me.
> Of course, I have my setting at 3.0 days right now. I don't know that it needs
> to be higher with having all 5 production projects assigned on all my machines
Do you run a 'solo-project' with 3.0 days? Then the CC 4.45 seems to work right; but if you run 'Multi-projects' (more than 2) then: why the client do not get at least when he reports the last WU from a project new work from that project??
No the client runs all the other queued WU's and stays then to get always work from that project where the last WU was reported and it rests on this project (even there are other settings)....
each step forward with the dev clients (after CC 4.53) was worser than the previous was and so on...
Is that the way where distributed computing is going?

a more an more frustrated CC 4.45-User ;-(
next weekend I will downgrad....
littleBouncer
ID: 8096 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8097 - Posted: 16 Jun 2005, 19:56:25 UTC

I am not sure I got the full drift :)

But, on the two machines that have 4.45, they all have work from all 4 of 5 projects as LHC is not issuing work. ...

The one I am running 4.44 has work from 3 of 5 projects (missing EAH)

and the one I am running 4.43 (mac - no choice or I would do 4.45), has work from 2 of 3 (also no EAH).

So, it has been running ok for me. Of course I do have fairly fast machines so the work is done pretty fast ... and as long as I have a reasonable amount on hand I guess I am fairly content with the operation.

The only thing I HAVE done recently was to force the uploads at PPAH today as they were back alive and I just wanted to get it up ... most of the time I don't even do that very much anymore.
ID: 8097 · Report as offensive     Reply Quote
Heffed

Send message
Joined: 2 Sep 04
Posts: 71
Credit: 8,657
RAC: 0
Message 8099 - Posted: 17 Jun 2005, 3:44:13 UTC - in response to Message 8092.  

> From what has been discussed over at SETI, it appears that the 2X connect time
> is built in to allow for modem users. So far the DEVs have not settled on a
> way to make it friendly for always connected folks.

Well, the funny bit is that the new scheduler isn't very friendly for modem users either. ;)
ID: 8099 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8102 - Posted: 17 Jun 2005, 12:39:30 UTC

What does this message mean ???

17.06.2005 14:35:55||Resuming round-robin CPU scheduling.

greetz littleBouncer
ID: 8102 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8103 - Posted: 17 Jun 2005, 12:50:11 UTC - in response to Message 8102.  

> What does this message mean ???
>
> 17.06.2005 14:35:55||Resuming round-robin CPU scheduling.

should be doing the project with the highest debt ... so, it will be back into the mode of the older scheduler.
ID: 8103 · Report as offensive     Reply Quote
STE\/E

Send message
Joined: 2 Sep 04
Posts: 352
Credit: 1,393,150
RAC: 0
Message 8104 - Posted: 17 Jun 2005, 14:02:16 UTC

should be doing the project with the highest debt
==========

That word debt is a pet peeve of mine Paul, why is there a debt, if I don't want to run a Project for 6 months thats my decision I feel. When I decide to run that Project again I don't feel I owe it anything because I didn't run it for 6 months ...

But the Client feels differently I guess, it feels I've been a naughty boy for not running that Project for so long and now wants to force me to run it ... Thats why I only stay attached to the Projects I want to run & I even suspend some of them to run exactly what I want to run & not what the Client wants me to run ...

ID: 8104 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8105 - Posted: 17 Jun 2005, 14:29:17 UTC - in response to Message 8104.  

> should be doing the project with the highest debt
> ==========
>
> That word debt is a pet peeve of mine Paul, why is there a debt, if I don't
> want to run a Project for 6 months thats my decision I feel. When I decide to
> run that Project again I don't feel I owe it anything because I didn't run it
> for 6 months ...
>
> But the Client feels differently I guess, it feels I've been a naughty boy for
> not running that Project for so long and now wants to force me to run it ...
> Thats why I only stay attached to the Projects I want to run & I even
> suspend some of them to run exactly what I want to run & not what the
> Client wants me to run ...

I feel your pain ... :)

At least we do have the option to suspend... I remember when we did not have that control. Which is why I guess I am so upbeat most of the time. We have many features and options that we did not in the past. And I remember when it was hard to keep BOINC running all day every day for some people ...
ID: 8105 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8107 - Posted: 17 Jun 2005, 16:44:24 UTC - in response to Message 8103.  

> > What does this message mean ???
> >
> > 17.06.2005 14:35:55||Resuming round-robin CPU scheduling.
>
> should be doing the project with the highest debt ... so, it will be back into
> the mode of the older scheduler.
>
THX Paul.
But since then I have to observe that both instances are running; if one WU finishes only one CPU rests active on a 2-CPU-HT!
Very nice client, indeed!- sarcastical-

greetz littleBouncer
ID: 8107 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 8109 - Posted: 18 Jun 2005, 2:27:03 UTC

The Einstein and Seti forums show people having issues with
'Unhandled Exception' errors with the 4.45 client.

Seti:
http://setiathome.berkeley.edu/forum_thread.php?id=15853
Einstein: (multiple threads)
http://einstein.phys.uwm.edu/forum_forum.php?id=6

When you have multiple people reporting that issue on multiple projects you can start ruling out PC voodoo.

ID: 8109 · Report as offensive     Reply Quote
Profile Liberto [Valencia]

Send message
Joined: 18 Sep 04
Posts: 34
Credit: 4,133
RAC: 0
Message 8111 - Posted: 18 Jun 2005, 5:52:24 UTC
Last modified: 18 Jun 2005, 5:53:06 UTC

The new version 4.45 on BOINC CC is working perfectly ok (that is after you let it stay by itself after 5 days) and then it takes all the process without your having to worry about it.

What I do worry about - and I am patient - is that after one complete month there have been no announcements in the front page regarding any news and/or messages. Once was said, that it does not cost a penny to have some one put a note over there informing about the subject. Are we advancing?, are we just stucked somewhere? Well let us know. If things continue this way I'll just think about it twice before continueing with LHC.

Respectfully said. Truly yours.
ID: 8111 · Report as offensive     Reply Quote
STE\/E

Send message
Joined: 2 Sep 04
Posts: 352
Credit: 1,393,150
RAC: 0
Message 8112 - Posted: 18 Jun 2005, 6:27:40 UTC

Are we advancing?, are we just stucked somewhere? Well let us know. If things continue this way I'll just think about it twice before continueing with LHC.
==========

Such is Life at the LHC Project Liberto, it's a classic case of hurry up and wait ... ;)


ID: 8112 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8114 - Posted: 18 Jun 2005, 14:48:04 UTC - in response to Message 8109.  

> The Einstein and Seti forums show people having issues with
> 'Unhandled Exception' errors with the 4.45 client.
>
> When you have multiple people reporting that issue on multiple projects you
> can start ruling out PC voodoo.

There was a change checked in regarding double releasing some pointers. Which, of course, causes these types of problems. Apparantly, if I understood the change correctly this has been a long standing bug in the code. But shows that other changes that are "unrelated" can cause a problem to surface.

So, I, for one; am looking forward to 4.46 ... not that I am having problems with 4.43 or 4.44, or ever 4.45 ...

Then again, maybe I am just lucky ...
ID: 8114 · Report as offensive     Reply Quote
Travis DJ

Send message
Joined: 29 Sep 04
Posts: 196
Credit: 207,040
RAC: 0
Message 8115 - Posted: 18 Jun 2005, 15:41:33 UTC - in response to Message 8114.  

> Then again, maybe I am just lucky ...

Probably lucky - no doubt you've seen on the boinc_dev mailing list quite a few postings about 4.45 clients requesting '0' seconds of work.

If LHC has updated their scheduler to work like E@H and P@H I'll upgrade my main machine from 4.19 to 4.25 - I'm having problems with the way their schedulers interact with 4.19 and deferring communication for sometimes 24 hours after all work is complete- which is odd because I have 'connect to network' set to 0.1 days on both -it forces me to manually update those 2 projects twice daily. In any case my point is still I don't trust the 4.4x versions to work properly once it's released as a 'recommended' version, they should call it beta until it successfully accomplishes that version's development goal for a specified amount of time.

I need some coffee I'm feeling awfully b*tchy this morning.
ID: 8115 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 8170 - Posted: 24 Jun 2005, 9:13:08 UTC

The boinc web site now lists a bunch of updates. Looks like server side stuff, and client side stuff, and a dial up issue fix.

I'm guessing they will have something on the download page pretty soon.

ID: 8170 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8420 - Posted: 13 Jul 2005, 10:13:06 UTC

stupid CC 4.45 behaviour:
13.07.2005 12:01:25||Using earliest-deadline-first scheduling because computer is overcommitted.
13.07.2005 12:01:25|LHC@home|Deferring communication with project for 1 days, 22 hours, 0 minutes, and 22 seconds

and the deadline for two WU's are: 13.07.2005 18:07:12 and 13.07.2005 20:29:07 !!!

so when he (CC 4.45) contacts at new, he will miss the deadline.

Very well organized!

greetz littleBouncer

ID: 8420 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8421 - Posted: 13 Jul 2005, 13:27:20 UTC

All I can say is I have my queue set for 3 days, version 4.45 on all machines but the PowerMac which has 4.43 and I have a good selection of work on ALL machines ...

If I look at all 8 (used to be 7), I have 235 results in flight (includes about 10 that are pending upload to SETI@Home), giving me (according to BOINC View) 106 days of work ... granted I still have the problem of bad work times with SETI@Home being as much as double what my real times are ... and Einstein@Home being consistently lower ...

Anyway, all I am saying is give peace a chance ... oops ... wrong thought ... :)

Not trying to push anyone to try a version they are not reeady for ... just commenting ...
ID: 8421 · Report as offensive     Reply Quote
Profile Blank Reg

Send message
Joined: 17 Sep 04
Posts: 49
Credit: 25,253
RAC: 0
Message 8423 - Posted: 13 Jul 2005, 15:01:19 UTC - in response to Message 8420.  
Last modified: 13 Jul 2005, 15:05:52 UTC

> stupid CC 4.45 behaviour:
> 13.07.2005 12:01:25||Using earliest-deadline-first scheduling because
> computer is overcommitted.
> 13.07.2005 12:01:25|LHC@home|Deferring communication with project for 1
> days, 22 hours, 0 minutes, and 22 seconds

>
> and the deadline for two WU's are: 13.07.2005 18:07:12 and
> 13.07.2005 20:29:07 !!!
>
> so when he (CC 4.45) contacts at new, he will miss the deadline.
>
> Very well organized!
>
> greetz littleBouncer
>
>

Bouncer you need to merge a few boxes acct shows you have 10......Lower the connect to time, works for me, set at 2.....
BOINC Wiki
ID: 8423 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8424 - Posted: 13 Jul 2005, 15:12:48 UTC - in response to Message 8423.  
Last modified: 13 Jul 2005, 15:14:04 UTC

>
> Bouncer you need to merge a few boxes acct shows you have 10......Lower the
> connect to time, works for me, set at 2.....
>
I have to finish first those WU's with oldest deadline (from then when I was worry, that we went dry...) and that was the old settings which produced this effect.
BTW:Those two WU's are crunched and reported, and even granted....
ID: 8424 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : New version of Boinc


©2024 CERN