Thread 'Long WU's'

Author	Message
Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 24660 - Posted: 20 Aug 2012, 7:34:45 UTC - in response to Message 24629. So it is definately something with AP, and probably also something with v6.2.x, but probably only when in combination with AP. I've had other commitments since I wrote my original message last Friday so I'm sorry I'm only just now able to find time to catch up with the various replies. Particular thanks to Richard for pointing out the implications of the client reporting zero run time. I know I'm using an old client but I have particular reasons for doing that. This old client version does report both run time and CPU time for tasks running normally. It's only when AP is invoked that the run time is set to zero. AP has now been removed from all my hosts. It took a while for the caches to drain on the last few of them as they had 100+ tasks on board at the time NNT was set last week. This is apparently a further artifact of the old client and AP combination. When running normally, the limit of 4 tasks per core is enforced. When running under AP the limit disappears and the client keeps receiving tasks whenever it makes a request. I would like to apologise to all those who received zero credit for tasks when paired with one of my AP hosts. Now that none of them ever seem to run the generic app, there is no reason for me to use AP here again. I'm sorry it took a while for me to realise the extent of the problem and that it could be worked around by getting rid of AP. There was one aspect that puzzled me until just now. When some tasks were receiving zero credit, why didn't all tasks, since all had zero run times. I just noticed that those tasks receiving zero credit all seem to be the _0 task which in turn seems to become the canonical result in a two-task quorum. Seemingly, if my task didn't become the canonical result then all was fine. Cheers, Gary. ID: 24660 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24671 - Posted: 21 Aug 2012, 9:13:11 UTC - in response to Message 24660. Thanks for that; seems to fit in with Igor's post on Credits. See his latest post Credits, but he will look at all this again tomorrow. ID: 24671 · Reply Quote

[AF>FAH-Addict.net]toTOW Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,727,865 RAC: 0	Message 24682 - Posted: 22 Aug 2012, 0:07:30 UTC I just got one more big WU ... estimated to run 260 hours on my old P4-m 1.4GHz ... deadline is on the 28th. ID: 24682 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24687 - Posted: 22 Aug 2012, 15:40:34 UTC - in response to Message 24682. I don't really understand why, but should be OK now. Keep me posted. Eric. ID: 24687 · Reply Quote

[AF>FAH-Addict.net]toTOW Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,727,865 RAC: 0	Message 24725 - Posted: 26 Aug 2012, 10:33:49 UTC - in response to Message 24687. Unfortunately, it won"t make it ... only two days remaining to the deadline, and still 145 hours of calculation to go :( ID: 24725 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24726 - Posted: 26 Aug 2012, 11:38:48 UTC - in response to Message 24725. Well I am really not sure that the deadline won't be extended..... If not my apologies. I have got 999.9% of these back now, but then again maybe they were from "fast" machines. We will be doing a full post-mortem and are studying how to do better next time with lots of great feedback and suggestions. ID: 24726 · Reply Quote

Ano Send message Joined: 29 Nov 09 Posts: 42 Credit: 229,229 RAC: 0	Message 24739 - Posted: 30 Aug 2012, 14:28:50 UTC Report deadline: 30 Aug 2012 \| 8:55:53 UTC Received: 30 Aug 2012 \| 14:16:05 UTC Validate state: Valid Look like I got lucky on this one. So it was ok not to cancel it after all. ID: 24739 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24740 - Posted: 30 Aug 2012, 18:58:46 UTC - in response to Message 24739. Well I am not so sure about luck. As I said we shall be looking at this next week and I'll get back to you (and others) ID: 24740 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 0	Message 24741 - Posted: 30 Aug 2012, 19:23:29 UTC I seem not to have chosen my wingmen for the long WUs wisely 8Â¬( Of the 8 I have completed, only 2 wingmen have returned successfully with 1 still in progress. 2 errored out (pages and pages of errors for this host , 2 missed deadline (30 missed deadline on this host)and 1 aborted (aborted all his long ones). The 5 unreturned ones are all still listed as Unsent. Just wondering if they will be resent for my results to get validated or if the Long Wu experiment is complete for now? ID: 24741 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1320 Credit: 99,944,916 RAC: 139,084	Message 24743 - Posted: 30 Aug 2012, 22:01:52 UTC I just sent in my long task and it is validated too. http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=6347063 Volunteer Mad Scientist For Life unbelievable are you trying to promote linux again? ID: 24743 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 24745 - Posted: 31 Aug 2012, 4:34:49 UTC - in response to Message 24740. Last modified: 31 Aug 2012, 4:41:27 UTC Well I am not so sure about luck. As I said we shall be looking at this next week and I'll get back to you (and others) It has nothing at all to do with luck and everything to do with skill :-) Volunteers need to learn that it is extremely difficult, if not impossible, to schedule tasks properly for projects that have huge variations in run times therefore it is of utmost importance to configure an extremely small cache of no more than 0.1 days. Volunteers also need to remember that it's not the end of the world just because a Sixtrack task goes into panic mode and suspends some of their other projects for a while. Panic mode simply borrows some time from other projects but those projects get paid back and the project shares the volunteer specifies will be honored over the long run. But, for that to work and ton ensure that the other projects' tasks don't miss their deadlines, volunteers absolutely MUST CONFIGURE A SMALL CACHE, ESPECIALLY IF THEY HAVE A SLOWER CPU. ID: 24745 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24749 - Posted: 31 Aug 2012, 8:56:49 UTC I should have two tasks running on my laptop. One is running and should meet its deadline. The other, workunit 2828795, does not appear on my BOINC manager. Was it a ghost unit? Tullio ID: 24749 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24753 - Posted: 31 Aug 2012, 15:30:05 UTC - in response to Message 24749. OK, I got it. The laptop's BOINC manager is in Italian and sometimes its messages are not clear to me. English is much clearer. Tullio ID: 24753 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24754 - Posted: 31 Aug 2012, 17:42:16 UTC - in response to Message 24745. Thanks for that; I think we shall have an interesting (long) discussion next week! ID: 24754 · Reply Quote

Richard Haselgrove Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0	Message 24755 - Posted: 31 Aug 2012, 18:04:18 UTC - in response to Message 24754. Thanks for that; I think we shall have an interesting (long) discussion next week! The usual rule of thumb isn't quite as drastic as jujube suggests: Look at all your projects, find the one which had the shortest deadlines, and divide that deadline by the number of different projects you're attached to. So, with deadlines here being 7 days, that's probably your shortest. If you're attached to 3 or 4 projects, a 2 day cache might be OK: if you're attached to 7 or 8 projects, don't set a cache above 1 day. ID: 24755 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 24756 - Posted: 31 Aug 2012, 20:02:19 UTC - in response to Message 24755. Last modified: 31 Aug 2012, 20:11:53 UTC Thanks for that; I think we shall have an interesting (long) discussion next week! The usual rule of thumb isn't quite as drastic as jujube suggests: Look at all your projects, find the one which had the shortest deadlines, and divide that deadline by the number of different projects you're attached to. So, with deadlines here being 7 days, that's probably your shortest. If you're attached to 3 or 4 projects, a 2 day cache might be OK: if you're attached to 7 or 8 projects, don't set a cache above 1 day. Will that rule of thumb work for slow hosts with old P4 or Athlon64 processors? How about P3 machines? I mean you used the word "usual" which means it might not apply in all cases so are there cases where it's advisable to set the cache even smaller, for example for P3 or P4 era hosts? Perhaps for slow hosts it would be advisable to cut the cache size in half again? Also, based on other volunteers' experiences and reports it seems to me your "usual" rule of thumb tends to fail when one of their projects issues tasks that are very much longer than their usual tasks. I'm not sure exactly why but that's seems to be how it works. Any comments on that? ID: 24756 · Reply Quote

Richard Haselgrove Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0	Message 24757 - Posted: 31 Aug 2012, 22:09:41 UTC - in response to Message 24756. Will that rule of thumb work for slow hosts with old P4 or Athlon64 processors? How about P3 machines? I mean you used the word "usual" which means it might not apply in all cases so are there cases where it's advisable to set the cache even smaller, for example for P3 or P4 era hosts? Perhaps for slow hosts it would be advisable to cut the cache size in half again? I think the rule of thumb - which I quoted, but did not originate - dates from even before the era when P4s and Athlons ruled the world. Also, based on other volunteers' experiences and reports it seems to me your "usual" rule of thumb tends to fail when one of their projects issues tasks that are very much longer than their usual tasks. I'm not sure exactly why but that's seems to be how it works. Any comments on that? All absurdly general assertions need an exception, and I think you've put your finger on it. That rather depends whether you regard Eric's occasional experiments as an excitement, warranting manual intervention and micromanagement: or whether you prefer a totally fail-safe configuration, where 'auto' mode can cope with every eventuallity. And it depends how reliable the other projects are, too. I'm a close observer, willing to step in and micromanage when needed: so when I saw a 'long' task waiting eighth or ninth in line on a Q6600 with 2 days' cache, I bumped it to start running next: your strategy would have alowed BOINC to do that by itself. Horses for courses. ID: 24757 · Reply Quote

[AF>FAH-Addict.net]toTOW Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,727,865 RAC: 0	Message 24759 - Posted: 1 Sep 2012, 10:42:55 UTC - in response to Message 24726. Last modified: 1 Sep 2012, 10:43:24 UTC The WU is still being processed : 170 hours done, still 16 to go. Luckily, the task has not been reassigned yet, so I might be able to return it and get full credits for it, and the work won't be duplicated :) ID: 24759 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24760 - Posted: 1 Sep 2012, 16:59:51 UTC My cache is very small,0,25 days, and I rarely have more than one task of each of my 7 projects running. Only climateprediction.net gave me 2 tasks on the SUN WS but it has deadlines which are biblical in time so that is not a problem. On this laptop, running 4 projects and hibernating at night, I have to put NNT when a project sends me another task before the preceding one is finished. All my projects are equally shared, only Test4Theory@home runs on both system and has overtaken the RAC of deceased QuantumFIRE Alpha, which still appears in BoincStats. Ttullio ID: 24760 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 24763 - Posted: 2 Sep 2012, 4:58:52 UTC - in response to Message 24757. Will that rule of thumb work for slow hosts with old P4 or Athlon64 processors? How about P3 machines? I mean you used the word "usual" which means it might not apply in all cases so are there cases where it's advisable to set the cache even smaller, for example for P3 or P4 era hosts? Perhaps for slow hosts it would be advisable to cut the cache size in half again? I think the rule of thumb - which I quoted, but did not originate - dates from even before the era when P4s and Athlons ruled the world. There have been many changes to the scheduler since then. Perhaps the rule of thumb needs to be revised. Also, based on other volunteers' experiences and reports it seems to me your "usual" rule of thumb tends to fail when one of their projects issues tasks that are very much longer than their usual tasks. I'm not sure exactly why but that's seems to be how it works. Any comments on that? All absurdly general assertions need an exception, and I think you've put your finger on it. That rather depends whether you regard Eric's occasional experiments as an excitement, warranting manual intervention and micromanagement: or whether you prefer a totally fail-safe configuration, where 'auto' mode can cope with every eventuallity. And it depends how reliable the other projects are, too. I like to keep a close eye on BOINC but often I have to ignore it for a few days. I don't mind a little manual intervention but some volunteers have even less time for BOINC watching than I do and I can appreciate their desire for a totally fail-safe configuration. Alas, there probably is no such thing as a totally fail-safe configuration given the level of unpredictability (chaos?) when attached to several projects. One never knows what curve ball Project XYZ is going to throw at the scheduler next and that's why I recommend a very small cache of 0.1 days if one wants it to work as automatic as possible. I also strongly encourage projects to make sure task deadlines suit the maximum duration of their tasks. In that regard, I do feel Sixtrack dropped the ball with the very long tasks issued recently. I would ask that the admin(s) issuing the tasks also understand how deadlines are configured on the server and be willing to "setup" the deadline before issuing a batch of unusually long tasks. Issuing long tasks then emailing the other admin a request to adjust the deadline accordingly is not the way to do it, if that's what's been happening. Warn the other admin first and don't issue the long tasks until the other admin indicates appropriate adjustments have been made. The left hand needs to know what the right hand is doing at all times. As I said, from the client's perspective there is a great deal of chaos in the system when you're attached to several projects so every project needs to make sure the info they send to hosts regarding their tasks is appropriate/accurate. Projects that cause problems get set to NNT very quickly and are returned to "active duty" not so quickly. I would also like to remind Eric that BOINC does not revise the deadline, there was some speculation that it does. BOINC does revise the estimated duration of a project's tasks when it discovers a project's tasks are longer or shorter than expected. The deadline, however, is sacred and is never adjusted by BOINC client or server. I'm a close observer, willing to step in and micromanage when needed: so when I saw a 'long' task waiting eighth or ninth in line on a Q6600 with 2 days' cache, I bumped it to start running next: your strategy would have alowed BOINC to do that by itself. Horses for courses. Indeed my strategy did exactly that. The downside of my strategy is that if I lose my Internet connection for very long I run out of work rather quickly. Fortunately my ISP is extremely reliable and power outages in my area are extremely rare. I appreciate that some volunteers' ISP are very unreliable so a small cache may not be appropriate for them. ID: 24763 · Reply Quote