log in

Inconclusive, valid/invalid results


Advanced search

Message boards : Sixtrack Application : Inconclusive, valid/invalid results

1 · 2 · 3 · 4 . . . 8 · Next
Author Message
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 30700 - Posted: 9 Jun 2017, 0:53:43 UTC

Just to explain a bit; hope to have a fix very very soon. Eric.
(Copied from Number Crunching)

Because a null/empty fort.10 is treated as Valid we have a major
problem. For some reason somewhere in SixDesk/BOINC servers at CERN
and BOINC clients we are now getting many more of these than in the
past. I do not know how bad or how many as we still do not know
where to find the archived assimilator and validator logs.
This means that two null results can be validated and a possibly valid
result invalidated. A real mess. Perhaps we could temporarily
update the number of copies of each WU to say 5, a horrible work
around, and a waste of volunteer resources. It would be much better
to Invalidate null/empty fort.10 to get some meaningful numbers.
____________

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 30706 - Posted: 9 Jun 2017, 12:44:18 UTC

Awaiting the fix; I have the validator logs and I shall see what I can do,
but probably not much. Pethaps some credit for wrongly invalidated results.
(The logs are huge and I will have trouble with disk quotas :-(
Eric.
____________

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 30756 - Posted: 12 Jun 2017, 10:10:13 UTC

The fix should be applied very very soon. I don't think I can do much about credits as
they are now centrally managed. However I can thank you for your patience and
understanding. This fix will greatly facilitate the analysis of errors especially
as we introduce the new SixTrack version. We should also have a fix for outliers to
avoid the real time exceeded.

I am now having to prioritise an investigation of a physics issue involving "wrong"
but validated BOINC results. We shall see especially when the empty/null fort.10
fix is applied.

I will post a news whenever. Thanks again. Eric.
____________

Stick
Send message
Joined: 21 Aug 07
Posts: 40
Credit: 516,252
RAC: 132
Message 30918 - Posted: 21 Jun 2017, 17:23:25 UTC

You might want to look at All SixTrack tasks for computer 10452223. I am not sure if this is a good example of the thread's topic issue or just an example of one computer with problems. Obviously, I haven't done a thorough analysis, but I have noticed a lot of inconclusives when paired against Windows hosts. OTOH, it does have a number of valid results and, predominately, those seem to have been when paired against other x86_64-pc-linux-gnu hosts.

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 30919 - Posted: 21 Jun 2017, 17:58:41 UTC - in response to Message 30918.

Thanks, I am watching this host. I'll let you know. Eric.
____________

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,494,852
RAC: 1,536
Message 30963 - Posted: 23 Jun 2017, 6:34:14 UTC

@Eric

Another 2 candidates that may be checked:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981028
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981018

Desti
Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,053,743
RAC: 308
Message 30995 - Posted: 24 Jun 2017, 0:13:25 UTC

I've here some WU with this, where I've many hours crunching time, but the opponent trashed the WU after a few seconds. Interesting, that all the quick endings were running Intel i5 processors. Might they have triggered a processor bug?

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170
____________
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 30997 - Posted: 24 Jun 2017, 0:16:22 UTC - in response to Message 30995.

Indeed, I am desperate to find this problem Interesting comment
on i5, but so far I do not have enough statistics.....
Thanks a million. Eric.
____________

Stick
Send message
Joined: 21 Aug 07
Posts: 40
Credit: 516,252
RAC: 132
Message 31020 - Posted: 24 Jun 2017, 14:08:21 UTC
Last modified: 24 Jun 2017, 14:17:10 UTC

Deleted and reposted after edit.

Stick
Send message
Joined: 21 Aug 07
Posts: 40
Credit: 516,252
RAC: 132
Message 31021 - Posted: 24 Jun 2017, 14:15:23 UTC

I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts. Two of them are the kind Desti reported (1 long runtime, 1 very short):
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203
Both are from the same i5 processor - the one I reported earlier.

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 31022 - Posted: 24 Jun 2017, 14:17:27 UTC - in response to Message 31021.

OK, looks like a genuine rogue system. I'll try and have a look if
I can stay awake. Eric.
____________

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 375
Credit: 88,313,801
RAC: 172,927
Message 31025 - Posted: 24 Jun 2017, 16:37:09 UTC

I have inconclusives with these Linux hosts:

10486162 = 30% on this PC = i5-7500 (Ananon)

10484663 = 20% on this PC = G4600

10485156 = 96% on this PC = i5-7500 (Ananon)

10485911 = 91% on this PC = i5-7500 (Ananon)

10485912 = 90% on this PC = i5-7500 (Ananon)

10452223 = 95% ..... = i5-7500 (Ananon)

I have 8.8% total invalids

There seeems like a couple of rouge systems that cause them, they have plenty with windows and other linux hosts.

Could be all these i5's are owned by same person

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 31028 - Posted: 24 Jun 2017, 20:20:56 UTC - in response to Message 31025.

I have banned the two hosts
id=10485913;
id=10452223;

I will try and do more, from your very helpful list, but only tomorrow.
Eric.

Suspects:
10486162 = 30% on this PC = i5-7500 (Ananon)
10484663 = 20% on this PC = G4600
10485156 = 96% on this PC = i5-7500 (Ananon)
10485911 = 91% on this PC = i5-7500 (Ananon)
10485912 = 90% on this PC = i5-7500 (Ananon)
10452223 = 95% ..... = i5-7500 (Ananon) Reported a lot
#I have 8.8% total invalids
#Could be all these i5's are owned by same person
#I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts.
#Two of them are the kind Desti reported (1 long runtime, 1 very short):
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203
10485913 EngLab User ID 371618
10342612 Harris Notebook User ID 82208
#Both are from the same i5 processor - the one I reported earlier.
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170
#https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981028
#https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981018
____________

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 375
Credit: 88,313,801
RAC: 172,927
Message 31029 - Posted: 24 Jun 2017, 21:24:24 UTC

I sent messages to the owners of 10484663 & 10405110 asking to take a look

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 31048 - Posted: 25 Jun 2017, 15:31:52 UTC - in response to Message 31029.

I have followed up; I will report soonest on my findings. Eric.
____________

crashtech
Send message
Joined: 10 May 17
Posts: 1
Credit: 1,035,339
RAC: 3,971
Message 31049 - Posted: 25 Jun 2017, 16:11:41 UTC

Hi, I have not seen any Sixtrack validations since 24 Jun 2017, 23:25:52 UTC. Currently my account lists 432 tasks as validation pending.

My top three machines that should have validations are: 10486289, 10480054, 10486369.

Other members of my team (TeAm Anandtech) are reporting similar difficulties. I hope it is okay to post about this here.

xii5ku
Send message
Joined: 7 May 17
Posts: 8
Credit: 1,415,559
RAC: 0
Message 31050 - Posted: 25 Jun 2017, 16:35:14 UTC - in response to Message 31049.
Last modified: 25 Jun 2017, 16:56:15 UTC

Ditto.

I had plenty of SixTrack tasks validated up until June 24, 9:09 UTC. Since then, only 3 (three) more validated. All other completed SixTrack tasks are either "validation pending" (1/3 of them) or "validation inconclusive" (2/3 of them), and more tasks are continuing to migrate from pending to inconclusive as we speak.

(Edit: I downloaded SixTrack tasks between Wednesday, June 21 20:16 UTC and Saturday, June 24 14:33 UTC. Inconclusive tasks came from this entire timeframe.)

The new validator appears to put a lot more tasks into "inconclusive" state --- for better or worse.

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 31055 - Posted: 25 Jun 2017, 19:50:27 UTC - in response to Message 31049.

Absolutely OK (I hope). We now reject a lot of dud results and this means running
a 3rd task or more. Sadly these new Tasks go to the back of the queue (another
issue/problem). I am quietly confident (or I'll eat my hat, resign or be fired). I will be
posting again tomorrow after I have had a look in detail. It should mean that if your result
is valid, it will eventually be validated (and you get your credit). Sorry about all this but
there was a mess before (~2% level of all Tasks). I think I better post a fuller explanation
to help clarify, but not tonight. Eric.
____________

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 843
Credit: 1,446,391
RAC: 115
Message 31056 - Posted: 25 Jun 2017, 19:51:20 UTC - in response to Message 31049.

See my reply above. Eric.
____________

Stick
Send message
Joined: 21 Aug 07
Posts: 40
Credit: 516,252
RAC: 132
Message 31057 - Posted: 25 Jun 2017, 22:12:36 UTC

Don't know if this is good or bad news, but immediately after the validator change, my inconclusive count jumped from 6 to 11. And the new group is very different. Prior to the change, all 6 of my inconclusives were paired against tasks done by x86_64-pc-linux-gnu machines. Now, 4 out of the 5 new ones were pairings between my SixTrack v451.07 (sse2) windows_x86_64 tasks and a variety of machines running SixTrack v451.07 (pni) windows_x86_64.

1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Sixtrack Application : Inconclusive, valid/invalid results