Thread 'Did everyone get work 02 Nov UTC?'

Author	Message
River~~ Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0	Message 15314 - Posted: 2 Nov 2006, 18:54:49 UTC We had work on the server for 9 or 10 hours from just before midnight UTC on 1stNov till arounf 0930 UTC on 2nd, and no sign of connections issues. Looking at Scarecrow's Graphs it is obvious that work was being added to the server at around 8000 tasks/hour from 2300 to 0100 and at over double that rate from 0100 to 0200. During those three hours the server was filling faster than we could take the work off it; and there was enough work for it to keep handing it out for another 7 hours. The other interesting feature is that at around 0300 the rate of take up of work drops. Funnily enough, that is at around four hours after the work started to be issued, ie when every client on a 4 hour backoff has had one bite at the cherry. My interpretation of the slower rise after that is a combination of clients coming back after reporting completed work, and other boxes being added by users who don't like leaving their boxes asking for work from an empty project. All clients that have been left long term on the 4hr standoff cycle should therefore have had the chance to get work. Two reasons you might have missed out If you had got into a 1 day backoff, or a 7 day backoff, you might have missed the work - these only happen (I think) after network problems. Also if you had disabled LHC (suspended, detached, or set "no more work") you would have missed out if you sleep normal EU hours -- the lesson here is to leave the client asking for work, it costs almost nothing. So, apart from those two reasons, did anyone else miss out this time? River~~ ID: 15314 · Reply Quote

Andreas Send message Joined: 2 Aug 05 Posts: 33 Credit: 2,332,514 RAC: 0	Message 15316 - Posted: 2 Nov 2006, 20:07:14 UTC - in response to Message 15314. Last modified: 2 Nov 2006, 20:07:26 UTC So, apart from those two reasons, did anyone else miss out this time? River~~ I missed out, mostly because my processor died :-/ ID: 15316 · Reply Quote

Rob Lilley Send message Joined: 29 Nov 05 Posts: 8 Credit: 105,015 RAC: 0	Message 15317 - Posted: 2 Nov 2006, 22:02:42 UTC - in response to Message 15316. So, apart from those two reasons, did anyone else miss out this time? River~~ Yeah, 'cos I need to sleep and work and I don't have my computers on 24/7 ... I'm such a lightweight! ID: 15317 · Reply Quote

Dronak Send message Joined: 19 May 06 Posts: 20 Credit: 297,111 RAC: 0	Message 15318 - Posted: 2 Nov 2006, 23:42:01 UTC I didn't completely miss out, no. But I only got 2 work units. With over 30000 still in progress right now, I would have hoped for more, but I suppose people with huge caches are hoarding the work (again, as usual). ID: 15318 · Reply Quote

[B^S] ShanerX Send message Joined: 14 Jul 05 Posts: 41 Credit: 1,788,341 RAC: 0	Message 15322 - Posted: 3 Nov 2006, 0:50:03 UTC Not so true ... I just take the good advice of leaving computers connected to the project. I got lucky and was suspending other projects for a different one and got a good bunch of LHC wu's. Now if we could just see that reflected in the stats ... that would be awesome! ID: 15322 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 15327 - Posted: 3 Nov 2006, 2:19:53 UTC - in response to Message 15314. .... So, apart from those two reasons, did anyone else miss out this time? River~~ Yup - my cache was full with other projects.... no work for me this time. ID: 15327 · Reply Quote

Boogyman Munster Send message Joined: 2 Sep 04 Posts: 4 Credit: 388,462 RAC: 0	Message 15328 - Posted: 3 Nov 2006, 2:30:43 UTC I was able to snag a few wu's... still crunching! Yeah me!!! Thanks, *Boogyman Munster* ID: 15328 · Reply Quote

genes Send message Joined: 29 Sep 04 Posts: 25 Credit: 77,910 RAC: 0	Message 15329 - Posted: 3 Nov 2006, 4:03:54 UTC I got from 1 to 3 WU's on every machine! All have 0.1 day cache, and all work on a bunch of projects. Happy, happy, joy, joy!!! ID: 15329 · Reply Quote

Webmaster Yoda Send message Joined: 13 Jul 05 Posts: 12 Credit: 8,463 RAC: 0	Message 15330 - Posted: 3 Nov 2006, 5:37:07 UTC I got a few of them, but until they sort out XML stats, I'm not going to do much work here, even if there's unlimited work available. ID: 15330 · Reply Quote

darkclown Send message Joined: 30 Sep 06 Posts: 9 Credit: 5,298 RAC: 0	Message 15331 - Posted: 3 Nov 2006, 6:21:02 UTC - in response to Message 15330. I got a few of them, but until they sort out XML stats, I'm not going to do much work here, even if there's unlimited work available. In the same boat here. Got about 12 WUs, finished & returned them, and set lhcathome to No New Tasks, until stats is sorted. ID: 15331 · Reply Quote

River~~ Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0	Message 15332 - Posted: 3 Nov 2006, 6:29:08 UTC - in response to Message 15330. I got a few of them, but until they sort out XML stats, I'm not going to do much work here, even if there's unlimited work available. I look at this differently - the stats will be there eventually and the work I do now will show whenever they are finally exported - and, of course, if some people continue to do work and others don't then when the stats come back they will be to my advantage. R~~ ID: 15332 · Reply Quote

River~~ Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0	Message 15333 - Posted: 3 Nov 2006, 6:51:08 UTC - in response to Message 15327. .... So, apart from those two reasons, did anyone else miss out this time? ... Yup - my cache was full with other projects.... no work for me this time. John Keck advocates that wih N projects the max cache to keep all the projects hungry at all times is 0.4 * Deadline / N; I suggest a somewhat larger setting of Deadline / (N+2). Of course this is no help if you need a large cache for some other reason, nor for people on dial-ups that only connect a limited number of times in the day/week. So the two reasons have grown to six: 1. Client in 1-day or 7-day standoff (probably due to net problems which may have been local to the box) 2. Project suspended etc 3. Machine powered down 4. Prefs set to prevent network access during the entire period that work was available (nobody has said this yet, but the work release must have covered roughly a working day for some timezone, even tho this one missed both the US and EU working days) 5. Machine had a fault at just the time when the work came available 6. Machine in "No Work Fetch" mode due to work held for other projects. My comiserations for anyone who did not get work, and did anyone else not get work for a reason not listed in those six, please? R~~ ID: 15333 · Reply Quote

sysfried Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0	Message 15337 - Posted: 3 Nov 2006, 10:26:27 UTC - in response to Message 15333. I got enough work .... :-) I won't mind more, my clients were always set to recieve work from LHC and I manually unlocked other projects to get a few WU's elsewhere.... <-- me = happy :-) ID: 15337 · Reply Quote

FalconFly Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0	Message 15339 - Posted: 3 Nov 2006, 14:18:56 UTC - in response to Message 15337. Last modified: 3 Nov 2006, 14:22:20 UTC I missed it because I have a 2-3 day backoff myself and all machines have LHC on "Suspended" for the obvious problems with LTD. Unless I manually get to know or change the Status of LHC, I won't even notice there is work until it really flows normal again (so far I haven't seen anything from the Project that would earn it to be un-suspended again). IMHO there should be an additional BOINC error code ;) -error 9220 : Staff disconnected from Project - come back later Scientific Network : 45000 MHz - 77824 MB - 1970 GB ID: 15339 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 15340 - Posted: 3 Nov 2006, 15:33:13 UTC - in response to Message 15333. [quote].... John Keck advocates that wih N projects the max cache to keep all the projects hungry at all times is 0.4 * Deadline / N; I suggest a somewhat larger setting of Deadline / (N+2). ..../quote] Can you explain why these magic numbers were chosen? And what the difference between your suggestion and his is designed to optimize? Also, where does one find the value of Deadline? And finally, is N the number of currently ACTIVE (not suspended) projects or total projects? ID: 15340 · Reply Quote

River~~ Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0	Message 15342 - Posted: 3 Nov 2006, 19:13:48 UTC - in response to Message 15340. Last modified: 3 Nov 2006, 19:15:12 UTC .... John Keck advocates that wih N projects the max cache to keep all the projects hungry at all times is 0.4 * Deadline / N; I suggest a somewhat larger setting of Deadline / (N+2). .... Can you explain why these magic numbers were chosen? I can explain mine. Ive never understood his - but I have tested his in practice with the results as described. It may be that his summarises experience rather than comes from theory. The theory behind my rule of thumb: Assume (contrary to fact) that we can vary D to keep out of NWF. We will drop this assumption later. The No Work Fetch algorithm is designed to make sure a deadline is not exceeded in the worst case where a task completes a zillisecond after a connect, and when the next connect is the full interval away. So there needs to be a gap of C to allow for that. There needs to be a gap of C to fit in a cachefull of new work we are about to download, as otherwise the client will refuse to ask for work, so we allow another C for that. Then I allow a full cache for every project. So we would expect to be in NWF if D < (N+2) * C. To stay out of NWF we want D > (N+2) * C. Now it is time to drop the false assumption that we can vary D. In fact D and N are givens, and C is the value we can control. Rearrange as an algebra exercise gives C < D / (N+2) Why then does my formula sometimes go into NWF when it is designed not to? Well the expected run times don't exactly fit the cache size, and the client will always go for the extra task that straddles the cache limit rather than stop short. Also a task may actually run for longer than it claimed. In either event the client goes into NWF, but only for a few hours as my formula puts is right on the margin. And what the difference between your suggestion and his is designed to optimize? Can't comment on the design - the practical outcome is that my formula dips briefly into NWF for a few hours quite often, and his doesn't. Whether John found his formula by experiment or theory I will leave him to say. John's formula allows more slack than mine, and this is clearly related to the fact that his is better than mine at keeping out of NWF entirely. Also, where does one find the value of Deadline? Look at a task that has just been downloaded from a project, and subtract the current date from the deadline. Or look at a task that has not yet reported on the project website, and work out (deadline date) - (date sent). On LHC it is currently just over 6.5 days, Rosetta currently 10 days (has been 7 and 14 in the last few months), Leiden 6 days -- but LHC and Rosetta do change from time to time so it is worth re-checking. As projects vary, D in the formula is the shortest of these. And finally, is N the number of currently ACTIVE (not suspended) projects or total projects? The number is the same, active = total. This is because formulae apply only when you leave the suspend button alone. Every time you suspend or unsuspend you break the long term debt / short term debt balance, and the client will behave oddly for a few days. You can drive BOINC hands on, or hands off, but need to decide to stick to one of the other for at least 2D at a time. In particular if you use the suspend button to cure a NWF situation, you almost guarantee another in about C days time. Using either formula expect to see NWF a few times for the first 2D till the client settles down to work the way it is meant to. Hope that helps. I appreciate the level of detail in your questions, here and in other threads, which indicate that you are really engaging with the points made. R~~ ID: 15342 · Reply Quote

Colin Porter Send message Joined: 14 Jul 05 Posts: 35 Credit: 71,636 RAC: 0	Message 15343 - Posted: 3 Nov 2006, 20:32:10 UTC Got NO LHC work at all. Why - Well I think it was because - After months of being on 24/7, I decided to switch off while I was away on holiday this last week. Bloody typical - All that work and I missed it. ID: 15343 · Reply Quote

m.mitch Send message Joined: 4 Sep 05 Posts: 112 Credit: 2,319,063 RAC: 1	Message 15346 - Posted: 4 Nov 2006, 1:17:25 UTC - in response to Message 15343. Last modified: 4 Nov 2006, 1:19:15 UTC Got NO LHC work at all. Why - Well I think it was because - After months of being on 24/7, I decided to switch off while I was away on holiday this last week. Bloody typical - All that work and I missed it. Colin, here are my two rules to give anyone the best chance at getting work units from any project: (1) set the cache to 2 hours. (1 day/12 hours = 0.0833333333333333333333333333333333334) It fits. (2) leave the PC's attached to the project. This way, you're banging on the project door every 2 hours and saying, "Are we there yet?" ;-) The response is binary! Click here to join the #1 Aussie Alliance on LHC. ID: 15346 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 15348 - Posted: 4 Nov 2006, 3:34:31 UTC - in response to Message 15342. Last modified: 4 Nov 2006, 3:37:59 UTC .... Hope that helps. I appreciate the level of detail in your questions, here and in other threads, which indicate that you are really engaging with the points made. R~~ Well thank you for the kind words and the effort to explain. A lot to contemplate. Is it correct that Resource share is not considered at all in your formulas? I have several projects. One of them, SZTAKI seems not well behaved. Sometimes the WUs don't complete in anywhere near the estimated time. Sometimes they seem to not complete at all. So I suspend it "a lot" and run it every few weeks to see if it is doing any better. ID: 15348 · Reply Quote

Philip Martin Kryder Send message Joined: 21 May 06 Posts: 73 Credit: 8,710 RAC: 0	Message 15349 - Posted: 4 Nov 2006, 3:38:59 UTC - in response to Message 15346. Last modified: 4 Nov 2006, 3:41:01 UTC .... (1) set the cache to 2 hours. (1 day/12 hours = 0.0833333333333333333333333333333333334) It fits. .... Why 2 hours instead of say 70 minutes? (.05) ID: 15349 · Reply Quote