Thread 'The project giveth then taketh away'

Author	Message
Kibble Send message Joined: 14 Jan 07 Posts: 33 Credit: 255,657 RAC: 0	Message 19896 - Posted: 11 Aug 2008, 7:23:03 UTC Last modified: 11 Aug 2008, 7:27:38 UTC Unfortunately all of the work units I've returned have been cancelled by the server and classed as redundant from the latest batch. There are four others waiting in the queue to be crunched, and I fully expect them to be "redundant" also. I wonder if River's statement that when the work units are issued the results are needed is true at this point. (CF: http://lhcathome.cern.ch/lhcathome/forum_thread.php?id=2261#14634) ID: 19896 · Reply Quote

Michael Karlinsky Send message Joined: 18 Sep 04 Posts: 163 Credit: 1,682,370 RAC: 0	Message 19897 - Posted: 11 Aug 2008, 8:31:04 UTC This is a feature which came with the new server code recently installed. Might or might not have been consciously activated by the admins. Michael Team Linux Users Everywhere ID: 19897 · Reply Quote

Keck_Komputers Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0	Message 19898 - Posted: 11 Aug 2008, 11:53:45 UTC - in response to Message 19897. This is a feature which came with the new server code recently installed. Might or might not have been consciously activated by the admins. Michael I hope it was activated on purpose. I have long stood behind the project's decision to issue extra tasks to increase turnaround. However I do not like my machines doing unneeded work. This feature allows the extra initial tasks without the unneeded ones. BOINC WIKI BOINCing since 2002/12/8 ID: 19898 · Reply Quote

Kibble Send message Joined: 14 Jan 07 Posts: 33 Credit: 255,657 RAC: 0	Message 19904 - Posted: 11 Aug 2008, 19:18:38 UTC - in response to Message 19899. But if you crunch a task to 90% and then it gets canceled as redundant then your computer has done unneeded work. And, if the cancels work the way they do on other projects I've seen, you don't get any credits for the work you did (which doesn't bother me but I have a hunch it might bother lots of other folks). The better way to get rapid completion without all the wasted effort that setting Initial Replication greater than Quorum entails, is to make the deadlines shorter. If there are complaints that some people cannot handle shorter deadlines then tough, not every project is for every cruncher. [/quote] Most people here are crunching for other projects as well. It would be unrealistic to expect otherwise. The time share for this project on my computer is the same as all others. BOINC adjusts things so that the work gets done by the time it is needed. An adjustment to the number of neccessary work units issued would solve some problems as would simply giving credit for all returned work units, eliminating this specific problem once and for all! I see two obvious face saving possibilities here. The work is released by staff at CERN, but their calculations for the number of results needed to detail the the dynamic aperture were faulty. The other is that the work units themselves were corrupted. (A distant possibility since the server is labeling my results simply as redundant.) The timing for work unit return should be based on the project needs as well as when those work units can be reassigned to other volunteers. If at all possible a fudge factor should be introduced to allow for corrupt work units to be fixed which involves time or server outage to be allowed for which also involves time. I submit that the project scientists and their staff at CERN are playing safe by dropping unnecessary work units into the system for some reason. Either that or the servers in the UK are broke. There are some other problems which seem to have crept up, but that is grist for another thread. Too bad problems have shown up when most of the staff is out on holiday. I suspect that things will be put right sometime soon, however. Only time will tell, though. I, personally, am interested in how many others have not collected credit for work units. No one has said specifically that their work units are labelled as redundant. In my case it was all of the ones I received. This might be a case of staff not caring or unwilling to fix a complete run of thousands of of bad results manually much like the unending pending problem. If communications between CERN and the UK is still working there may be a repeat of this run in the offing. ID: 19904 · Reply Quote

Ocean Archer Send message Joined: 13 Jul 05 Posts: 143 Credit: 263,300 RAC: 0	Message 19906 - Posted: 12 Aug 2008, 3:11:29 UTC Kibble, Dagorath, et.al. -- I too would be upset by finding the WUs my machines worked on had been designated "redundant", but looking back over several months, I don't find that to be the case. Of course, most of my machines are of the older P III (coppermine) type processors, running about 950mhz speed, so I keep my queues to a minimum. This enables me to return the work in a timely manner. I suspect that those who run queues that are several days in size run a greater risk of having their packets cancelled due to others completing them and returning them sooner. Like you both mention - the discussion of initial replication is one for another thread, but for a 'junior cruncher' like myself, it works ... If I've lived this long, I've gotta be that old ID: 19906 · Reply Quote

Kibble Send message Joined: 14 Jan 07 Posts: 33 Credit: 255,657 RAC: 0	Message 19907 - Posted: 12 Aug 2008, 3:37:33 UTC - in response to Message 19906. I probably did not make it clear that all work I got was turned back in on time and the last four were cancelled within hours of my receiving them There are obviously some problems, but without being able to check the work unit field, http://lhcathome.cern.ch/lhcathome/forum_thread.php?id=2832, I can't confirm that the redundant designation is correct. One way or another the administrators will find out about the problems. ID: 19907 · Reply Quote

Ocean Archer Send message Joined: 13 Jul 05 Posts: 143 Credit: 263,300 RAC: 0	Message 19912 - Posted: 12 Aug 2008, 13:58:27 UTC Dagorath -- I can only comment on the ones I received, and they were a mixture of packets running SixTrack 4.67 I say "mixture", because the WUs took anywhere from a short time of less than 3 seconds, to a time of over two hours, depending on machine and content. All in all, I received 12 packets on 10.8.08, 18 packets on 11.8.08, and so far, 5 packets today (spread across my six machines --> about 2 or 3 per machine per day) If I've lived this long, I've gotta be that old ID: 19912 · Reply Quote

yemonk Send message Joined: 12 Feb 06 Posts: 4 Credit: 1,918,743 RAC: 0	Message 19915 - Posted: 13 Aug 2008, 1:34:00 UTC I got eight WUs labeled redundant and canceled. This one lived only 46 minutes, from creation to return time. Stderr says: <core_client_version>6.2.14</core_client_version>, but I crunch for SETI and E@H too, without redundant WUs so far. Any redundant WUs with the core client 6.2.15? ID: 19915 · Reply Quote

yemonk Send message Joined: 12 Feb 06 Posts: 4 Credit: 1,918,743 RAC: 0	Message 19917 - Posted: 13 Aug 2008, 8:41:08 UTC Last modified: 13 Aug 2008, 8:55:27 UTC Only my two linux machines running 6.2.14 are getting redundant WUs. My two linux and four windows boxes running older clients haven't got any. That, the stderr message, and considering 6.2.14 is a pre release version, a "development version that may not function propely" as it warns at start, seem to point to a core client problem. But it could be just coincidence. I am switching those two machines to 6.2.15 today, when they crunch their last cached WUs. Let's see if that gets rid of the problem. Am I the only one experiencing problems with the posting text box? Words disappear or mess up and backlashes get added to single and double quotes when I hit the preview button. It's pretty awkward. ID: 19917 · Reply Quote

Kibble Send message Joined: 14 Jan 07 Posts: 33 Credit: 255,657 RAC: 0	Message 19923 - Posted: 13 Aug 2008, 16:10:55 UTC - in response to Message 19922. Last modified: 13 Aug 2008, 16:13:49 UTC No, you are not the only one. It is a bug in the forum code on the server. Avoid it by not using contractions in your post. P.S. Ooops! Avoiding contractions doesn't prevent the backslashes being added to quote characters, sorry.[/quote] Well, I see that you've discovered another problem, Dagorath. There is a workaround for this, use the edit feature to get rid of the unnecessary slash marks. Takes a little more time for your post, but makes it look a lot better. Kibble @yemonk: The BOINC development crew must be working overtime since there is a newer version available, 6.2.16. I'm upgrading immediately! ID: 19923 · Reply Quote

EclipseHA Send message Joined: 18 Sep 04 Posts: 47 Credit: 1,886,234 RAC: 0	Message 19925 - Posted: 13 Aug 2008, 16:53:36 UTC It seems that the WUs are all very short right now (hit the wall in a few seconds)... So, there are many cases of "canceled by server", as other users can start and finish in a matter of seconds They'll only be canceled if they haven't started yet (that is my understanding at least), so with "real" WUs, the chance of being canceled should go down. ID: 19925 · Reply Quote

FalconFly Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0	Message 19926 - Posted: 13 Aug 2008, 21:36:20 UTC - in response to Message 19925. Seeing the same here, lots of WorkUnits "cancelled by Server" despite still lots of headroom to the deadline (and I'm working on 0.75 day cache, so deadlines are normally never a factor). At least it looks like they're cancelled at 0%, which I don't have any problems with (apart from the odd looks of it, I assume it's either a test or a Server Setting that still needs tweaking after the upgrade) Scientific Network : 45000 MHz - 77824 MB - 1970 GB ID: 19926 · Reply Quote

yemonk Send message Joined: 12 Feb 06 Posts: 4 Credit: 1,918,743 RAC: 0	Message 19937 - Posted: 14 Aug 2008, 23:55:11 UTC My windows machines are getting redundants now, but not so many. Business as usual for the linux ones, so no client related issue here. Thank you for your comments. ID: 19937 · Reply Quote