Thread 'Damn wingman'

Author	Message
angler Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0	Message 24297 - Posted: 12 Jul 2012, 10:54:59 UTC http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1857346 was looking to see if my wingman had completed this result, was sort of surprised to find out the same machine had been assigned the validation. at least the results seemed consistent :D ID: 24297 · Reply Quote

White Mountain Wes Send message Joined: 1 Jan 09 Posts: 32 Credit: 1,106,567 RAC: 0	Message 24335 - Posted: 13 Jul 2012, 20:21:57 UTC - in response to Message 24297. I just noticed that I am my own wingman on this pair of tasks: http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1856175. This doesn't seem to be scientifically legitimate to me. And I Have noticed that this has happened to others too. Are the administrators aware that this is happening, and are they OK with it? ID: 24335 · Reply Quote

angler Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0	Message 24339 - Posted: 13 Jul 2012, 22:02:28 UTC - in response to Message 24335. I guess in your case you can abort the 2nd instance and someone else should get it. Believe a few others have encountered the situation recently as well. ID: 24339 · Reply Quote

White Mountain Wes Send message Joined: 1 Jan 09 Posts: 32 Credit: 1,106,567 RAC: 0	Message 24340 - Posted: 13 Jul 2012, 22:13:57 UTC - in response to Message 24339. I guess in your case you can abort the 2nd instance and someone else should get it. Believe a few others have encountered the situation recently as well. I am planning to do that if I don't hear anything to the contrary from someone higher up. It just seems to me that there should be something in place to prevent this from happening in the first place. Just another bug that the project needs work out I guess. ID: 24340 · Reply Quote

Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 24341 - Posted: 13 Jul 2012, 23:00:23 UTC - in response to Message 24340. .... Just another bug that the project needs work out I guess. No, it was a missing project config flag as reported by Richard a couple of days ago. He didn't get a response (he tried to get Eric's attention a couple of times) but it seems to have been attended to as I haven't seen any more recent examples. Yours are dated back then as well. I guess the Admins were too embarrassed to admit they goofed :-). Cheers, Gary. ID: 24341 · Reply Quote

White Mountain Wes Send message Joined: 1 Jan 09 Posts: 32 Credit: 1,106,567 RAC: 0	Message 24342 - Posted: 13 Jul 2012, 23:33:04 UTC - in response to Message 24341. Ahhh, that explains it. Thanks for the feedback. I'll be aborting that 2nd task ASAP. ID: 24342 · Reply Quote

Igor Zacharov Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0	Message 24343 - Posted: 14 Jul 2012, 0:16:16 UTC - in response to Message 24342. The mechanism for redundancy is there. We always had the flag in the config.xml file. It seems that we get the dublicates send to the same user/machine when we also enable the matchmaker schedule. It shouldn't happen in my opinion, but this is the fact. We have switched on the matchmaker scheduler, because with cache-job only it would is picky about the hosts it would send work to. We don't need the homogenious redundancy, it fact, we want to compare all possible combinations of computers to study the reproducibility of the sixtrack program. But they indeed should all be different computers. Anyway, in short, there is more experimentation to do with the job distribution algorithm we use for LHC@HOME. I will switch the matchmaker off now. Let's see if we get any dublicates sent to the same user during saturday/sunday. Igor. skype id: igor-zacharov ID: 24343 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 1	Message 24348 - Posted: 14 Jul 2012, 0:49:54 UTC - in response to Message 24343. The mechanism for redundancy is there. ...We don't need the homogenious redundancy, it fact, we want to compare all possible combinations of computers to study the reproducibility of the sixtrack program.... Igor. Is the goal to eliminate the need for redundancy in the future, to allow faster throughput of results (no repeats)? Regards, Bob P. ID: 24348 · Reply Quote

Igor Zacharov Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0	Message 24351 - Posted: 14 Jul 2012, 1:06:19 UTC - in response to Message 24348. Is the goal to eliminate the need for redundancy in the future, to allow faster throughput of results (no repeats)? The redundancy will be needed in the future also, since we cannot exclude faulty devices (not talking about cheating). In particular, one of the side effects of a large scale accelerator study is singling out hosts that produce wrong results. We have seen an indication of that in the past, when getting results from Overclokers, but did not do a systematic study of these effects. With the latest executable this is within reach. Just one explanation. The sixtrack program is for the accelerator study. It needs bit-accurate reproducibility to accurately frame out the appreture. If the results have artificial scatter we cannot zoom to the desired phase space boundary, even collecting order of magnitude more statistics. Eric McIntosh should give more explanation on this and the impact bit-reproducibility will have in science. skype id: igor-zacharov ID: 24351 · Reply Quote

Tex1954 Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0	Message 24361 - Posted: 14 Jul 2012, 14:39:51 UTC - in response to Message 24351. I can vouch for Overclockers having problems. Most of the problems I catch, but for a day or so, two boxes had problems. One box was easily corrected with a voltage tweak that was neglected on a major BIOS update. The other box had a bad motherboard and it was replaced. Both these boxes would run Prime95 all day/night no problem but fail other tests and certain BOINC jobs. I have since proven the corrections stable as before and all is okay now... Soo, yes, overclockers can generate good results that are wrong from time to time and one has to be painfully careful with that. Even machines setup pure stock with BIOS defaults can fail if the BIOS doesn't set things up properly... and I've experienced that as well. Also, it would be helpful (in the future) if the server would send a NOTICE to offending machines to wake up folks should this prove to be a problem in the future. However, usually BOINC will start doing random stupid things when there is a problem... like shutting down unexpectedly without error... GPU drivers suddenly working more slowly... all kinds of hints that something isn't perfect. :) ID: 24361 · Reply Quote