Message boards : Sixtrack Application : Inconclusive, valid/invalid results
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 9 · Next

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30700 - Posted: 9 Jun 2017, 0:53:43 UTC

Just to explain a bit; hope to have a fix very very soon. Eric.
(Copied from Number Crunching)

Because a null/empty fort.10 is treated as Valid we have a major
problem. For some reason somewhere in SixDesk/BOINC servers at CERN
and BOINC clients we are now getting many more of these than in the
past. I do not know how bad or how many as we still do not know
where to find the archived assimilator and validator logs.
This means that two null results can be validated and a possibly valid
result invalidated. A real mess. Perhaps we could temporarily
update the number of copies of each WU to say 5, a horrible work
around, and a waste of volunteer resources. It would be much better
to Invalidate null/empty fort.10 to get some meaningful numbers.
ID: 30700 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30706 - Posted: 9 Jun 2017, 12:44:18 UTC

Awaiting the fix; I have the validator logs and I shall see what I can do,
but probably not much. Pethaps some credit for wrongly invalidated results.
(The logs are huge and I will have trouble with disk quotas :-(
Eric.
ID: 30706 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30756 - Posted: 12 Jun 2017, 10:10:13 UTC

The fix should be applied very very soon. I don't think I can do much about credits as
they are now centrally managed. However I can thank you for your patience and
understanding. This fix will greatly facilitate the analysis of errors especially
as we introduce the new SixTrack version. We should also have a fix for outliers to
avoid the real time exceeded.

I am now having to prioritise an investigation of a physics issue involving "wrong"
but validated BOINC results. We shall see especially when the empty/null fort.10
fix is applied.

I will post a news whenever. Thanks again. Eric.
ID: 30756 · Report as offensive     Reply Quote
Stick

Send message
Joined: 21 Aug 07
Posts: 46
Credit: 1,503,835
RAC: 0
Message 30918 - Posted: 21 Jun 2017, 17:23:25 UTC

You might want to look at All SixTrack tasks for computer 10452223. I am not sure if this is a good example of the thread's topic issue or just an example of one computer with problems. Obviously, I haven't done a thorough analysis, but I have noticed a lot of inconclusives when paired against Windows hosts. OTOH, it does have a number of valid results and, predominately, those seem to have been when paired against other x86_64-pc-linux-gnu hosts.
ID: 30918 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30919 - Posted: 21 Jun 2017, 17:58:41 UTC - in response to Message 30918.  

Thanks, I am watching this host. I'll let you know. Eric.
ID: 30919 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,090,686
RAC: 51,239
Message 30963 - Posted: 23 Jun 2017, 6:34:14 UTC

ID: 30963 · Report as offensive     Reply Quote
Desti

Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,875,851
RAC: 0
Message 30995 - Posted: 24 Jun 2017, 0:13:25 UTC

I've here some WU with this, where I've many hours crunching time, but the opponent trashed the WU after a few seconds. Interesting, that all the quick endings were running Intel i5 processors. Might they have triggered a processor bug?

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]
ID: 30995 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30997 - Posted: 24 Jun 2017, 0:16:22 UTC - in response to Message 30995.  

Indeed, I am desperate to find this problem Interesting comment
on i5, but so far I do not have enough statistics.....
Thanks a million. Eric.
ID: 30997 · Report as offensive     Reply Quote
Stick

Send message
Joined: 21 Aug 07
Posts: 46
Credit: 1,503,835
RAC: 0
Message 31020 - Posted: 24 Jun 2017, 14:08:21 UTC
Last modified: 24 Jun 2017, 14:17:10 UTC

Deleted and reposted after edit.
ID: 31020 · Report as offensive     Reply Quote
Stick

Send message
Joined: 21 Aug 07
Posts: 46
Credit: 1,503,835
RAC: 0
Message 31021 - Posted: 24 Jun 2017, 14:15:23 UTC

I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts. Two of them are the kind Desti reported (1 long runtime, 1 very short):
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203
Both are from the same i5 processor - the one I reported earlier.
ID: 31021 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31022 - Posted: 24 Jun 2017, 14:17:27 UTC - in response to Message 31021.  

OK, looks like a genuine rogue system. I'll try and have a look if
I can stay awake. Eric.
ID: 31022 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 691,995,272
RAC: 115,347
Message 31025 - Posted: 24 Jun 2017, 16:37:09 UTC

I have inconclusives with these Linux hosts:

10486162 = 30% on this PC = i5-7500 (Ananon)

10484663 = 20% on this PC = G4600

10485156 = 96% on this PC = i5-7500 (Ananon)

10485911 = 91% on this PC = i5-7500 (Ananon)

10485912 = 90% on this PC = i5-7500 (Ananon)

10452223 = 95% ..... = i5-7500 (Ananon)

I have 8.8% total invalids

There seeems like a couple of rouge systems that cause them, they have plenty with windows and other linux hosts.

Could be all these i5's are owned by same person
ID: 31025 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31028 - Posted: 24 Jun 2017, 20:20:56 UTC - in response to Message 31025.  

I have banned the two hosts
id=10485913;
id=10452223;

I will try and do more, from your very helpful list, but only tomorrow.
Eric.

Suspects:
10486162 = 30% on this PC = i5-7500 (Ananon)
10484663 = 20% on this PC = G4600
10485156 = 96% on this PC = i5-7500 (Ananon)
10485911 = 91% on this PC = i5-7500 (Ananon)
10485912 = 90% on this PC = i5-7500 (Ananon)
10452223 = 95% ..... = i5-7500 (Ananon) Reported a lot
#I have 8.8% total invalids
#Could be all these i5's are owned by same person
#I now have 6 inconclusives, all paired with x86_64-pc-linux-gnu hosts.
#Two of them are the kind Desti reported (1 long runtime, 1 very short):
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71263850
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70924203
10485913 EngLab User ID 371618
10342612 Harris Notebook User ID 82208
#Both are from the same i5 processor - the one I reported earlier.
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70976515
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=70999827
#https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71368170
#https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981028
#https://lhcathome.cern.ch/lhcathome/result.php?resultid=147981018
ID: 31028 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 691,995,272
RAC: 115,347
Message 31029 - Posted: 24 Jun 2017, 21:24:24 UTC

I sent messages to the owners of 10484663 & 10405110 asking to take a look
ID: 31029 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31048 - Posted: 25 Jun 2017, 15:31:52 UTC - in response to Message 31029.  

I have followed up; I will report soonest on my findings. Eric.
ID: 31048 · Report as offensive     Reply Quote
crashtech

Send message
Joined: 10 May 17
Posts: 4
Credit: 18,138,665
RAC: 12,833
Message 31049 - Posted: 25 Jun 2017, 16:11:41 UTC

Hi, I have not seen any Sixtrack validations since 24 Jun 2017, 23:25:52 UTC. Currently my account lists 432 tasks as validation pending.

My top three machines that should have validations are: 10486289, 10480054, 10486369.

Other members of my team (TeAm Anandtech) are reporting similar difficulties. I hope it is okay to post about this here.
ID: 31049 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 7 May 17
Posts: 10
Credit: 6,952,848
RAC: 0
Message 31050 - Posted: 25 Jun 2017, 16:35:14 UTC - in response to Message 31049.  
Last modified: 25 Jun 2017, 16:56:15 UTC

Ditto.

I had plenty of SixTrack tasks validated up until June 24, 9:09 UTC. Since then, only 3 (three) more validated. All other completed SixTrack tasks are either "validation pending" (1/3 of them) or "validation inconclusive" (2/3 of them), and more tasks are continuing to migrate from pending to inconclusive as we speak.

(Edit: I downloaded SixTrack tasks between Wednesday, June 21 20:16 UTC and Saturday, June 24 14:33 UTC. Inconclusive tasks came from this entire timeframe.)

The new validator appears to put a lot more tasks into "inconclusive" state --- for better or worse.
ID: 31050 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31055 - Posted: 25 Jun 2017, 19:50:27 UTC - in response to Message 31049.  

Absolutely OK (I hope). We now reject a lot of dud results and this means running
a 3rd task or more. Sadly these new Tasks go to the back of the queue (another
issue/problem). I am quietly confident (or I'll eat my hat, resign or be fired). I will be
posting again tomorrow after I have had a look in detail. It should mean that if your result
is valid, it will eventually be validated (and you get your credit). Sorry about all this but
there was a mess before (~2% level of all Tasks). I think I better post a fuller explanation
to help clarify, but not tonight. Eric.
ID: 31055 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31056 - Posted: 25 Jun 2017, 19:51:20 UTC - in response to Message 31049.  

See my reply above. Eric.
ID: 31056 · Report as offensive     Reply Quote
Stick

Send message
Joined: 21 Aug 07
Posts: 46
Credit: 1,503,835
RAC: 0
Message 31057 - Posted: 25 Jun 2017, 22:12:36 UTC

Don't know if this is good or bad news, but immediately after the validator change, my inconclusive count jumped from 6 to 11. And the new group is very different. Prior to the change, all 6 of my inconclusives were paired against tasks done by x86_64-pc-linux-gnu machines. Now, 4 out of the 5 new ones were pairings between my SixTrack v451.07 (sse2) windows_x86_64 tasks and a variety of machines running SixTrack v451.07 (pni) windows_x86_64.
ID: 31057 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 9 · Next

Message boards : Sixtrack Application : Inconclusive, valid/invalid results


©2024 CERN