Message boards :
Sixtrack Application :
Inconclusive, valid/invalid results
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Just to explain a bit; hope to have a fix very very soon. Eric. (Copied from Number Crunching) Because a null/empty fort.10 is treated as Valid we have a major problem. For some reason somewhere in SixDesk/BOINC servers at CERN and BOINC clients we are now getting many more of these than in the past. I do not know how bad or how many as we still do not know where to find the archived assimilator and validator logs. This means that two null results can be validated and a possibly valid result invalidated. A real mess. Perhaps we could temporarily update the number of copies of each WU to say 5, a horrible work around, and a waste of volunteer resources. It would be much better to Invalidate null/empty fort.10 to get some meaningful numbers. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Awaiting the fix; I have the validator logs and I shall see what I can do, but probably not much. Pethaps some credit for wrongly invalidated results. (The logs are huge and I will have trouble with disk quotas :-( Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
The fix should be applied very very soon. I don't think I can do much about credits as they are now centrally managed. However I can thank you for your patience and understanding. This fix will greatly facilitate the analysis of errors especially as we introduce the new SixTrack version. We should also have a fix for outliers to avoid the real time exceeded. I am now having to prioritise an investigation of a physics issue involving "wrong" but validated BOINC results. We shall see especially when the empty/null fort.10 fix is applied. I will post a news whenever. Thanks again. Eric. |
Send message Joined: 21 Aug 07 Posts: 46 Credit: 1,503,835 RAC: 0 |
You might want to look at All SixTrack tasks for computer 10452223. I am not sure if this is a good example of the thread's topic issue or just an example of one computer with problems. Obviously, I haven't done a thorough analysis, but I have noticed a lot of inconclusives when paired against Windows hosts. OTOH, it does have a number of valid results and, predominately, those seem to have been when paired against other x86_64-pc-linux-gnu hosts. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks, I am watching this host. I'll let you know. Eric. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
@Eric Another 2 candidates that may be checked: https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981028 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981018 |
Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0 |
I've here some WU with this, where I've many hours crunching time, but the opponent trashed the WU after a few seconds. Interesting, that all the quick endings were running Intel i5 processors. Might they have triggered a processor bug? https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170 Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Indeed, I am desperate to find this problem Interesting comment on i5, but so far I do not have enough statistics..... Thanks a million. Eric. |
Send message Joined: 21 Aug 07 Posts: 46 Credit: 1,503,835 RAC: 0 |
Deleted and reposted after edit. |
Send message Joined: 21 Aug 07 Posts: 46 Credit: 1,503,835 RAC: 0 |
I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts. Two of them are the kind Desti reported (1 long runtime, 1 very short): https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203 Both are from the same i5 processor - the one I reported earlier. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
OK, looks like a genuine rogue system. I'll try and have a look if I can stay awake. Eric. |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
I have inconclusives with these Linux hosts: 10486162 = 30% on this PC = i5-7500 (Ananon) 10484663 = 20% on this PC = G4600 10485156 = 96% on this PC = i5-7500 (Ananon) 10485911 = 91% on this PC = i5-7500 (Ananon) 10485912 = 90% on this PC = i5-7500 (Ananon) 10452223 = 95% ..... = i5-7500 (Ananon) I have 8.8% total invalids There seeems like a couple of rouge systems that cause them, they have plenty with windows and other linux hosts. Could be all these i5's are owned by same person |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I have banned the two hosts id=10485913; id=10452223; I will try and do more, from your very helpful list, but only tomorrow. Eric. Suspects: 10486162 = 30% on this PC = i5-7500 (Ananon) 10484663 = 20% on this PC = G4600 10485156 = 96% on this PC = i5-7500 (Ananon) 10485911 = 91% on this PC = i5-7500 (Ananon) 10485912 = 90% on this PC = i5-7500 (Ananon) 10452223 = 95% ..... = i5-7500 (Ananon) Reported a lot #I have 8.8% total invalids #Could be all these i5's are owned by same person #I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts. #Two of them are the kind Desti reported (1 long runtime, 1 very short): #https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850 #https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203 10485913 EngLab User ID 371618 10342612 Harris Notebook User ID 82208 #Both are from the same i5 processor - the one I reported earlier. #https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515 #https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827 #https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170 #https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981028 #https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981018 |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
I sent messages to the owners of 10484663 & 10405110 asking to take a look |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I have followed up; I will report soonest on my findings. Eric. |
Send message Joined: 10 May 17 Posts: 4 Credit: 18,139,145 RAC: 3,568 |
Hi, I have not seen any Sixtrack validations since 24 Jun 2017, 23:25:52 UTC. Currently my account lists 432 tasks as validation pending. My top three machines that should have validations are: 10486289, 10480054, 10486369. Other members of my team (TeAm Anandtech) are reporting similar difficulties. I hope it is okay to post about this here. |
Send message Joined: 7 May 17 Posts: 10 Credit: 6,952,848 RAC: 0 |
Ditto. I had plenty of SixTrack tasks validated up until June 24, 9:09 UTC. Since then, only 3 (three) more validated. All other completed SixTrack tasks are either "validation pending" (1/3 of them) or "validation inconclusive" (2/3 of them), and more tasks are continuing to migrate from pending to inconclusive as we speak. (Edit: I downloaded SixTrack tasks between Wednesday, June 21 20:16 UTC and Saturday, June 24 14:33 UTC. Inconclusive tasks came from this entire timeframe.) The new validator appears to put a lot more tasks into "inconclusive" state --- for better or worse. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Absolutely OK (I hope). We now reject a lot of dud results and this means running a 3rd task or more. Sadly these new Tasks go to the back of the queue (another issue/problem). I am quietly confident (or I'll eat my hat, resign or be fired). I will be posting again tomorrow after I have had a look in detail. It should mean that if your result is valid, it will eventually be validated (and you get your credit). Sorry about all this but there was a mess before (~2% level of all Tasks). I think I better post a fuller explanation to help clarify, but not tonight. Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
See my reply above. Eric. |
Send message Joined: 21 Aug 07 Posts: 46 Credit: 1,503,835 RAC: 0 |
Don't know if this is good or bad news, but immediately after the validator change, my inconclusive count jumped from 6 to 11. And the new group is very different. Prior to the change, all 6 of my inconclusives were paired against tasks done by x86_64-pc-linux-gnu machines. Now, 4 out of the 5 new ones were pairings between my SixTrack v451.07 (sse2) windows_x86_64 tasks and a variety of machines running SixTrack v451.07 (pni) windows_x86_64. |
©2024 CERN