Message boards : Number crunching : Host messing up tons of results
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 1,199
Message 26688 - Posted: 18 Jul 2014, 22:18:03 UTC - in response to Message 26687.  

Yea I see I have 3 of those now, how can downloading a WU fail?? ;)
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26688 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 26692 - Posted: 19 Jul 2014, 3:00:33 UTC
Last modified: 19 Jul 2014, 3:07:30 UTC

The problem with host 10137504 lays in BOINC itself, the server side BOINC software does not really reduce the host's daily quota unless it had less than 2%(!!!) * valid results. But host 10137504 does return a valid result now and then.

I have reported this problem in several projects that had a similar problem but it seems not to be fixed.

* The quota works like this :
Invalid => Quota -= 1
Valid => Quota *= 2

but would better be :

Invalid => Quota /= 2
Valid => Quota += 1

You can exclude a host completely by setting the quota to -1 by hand, in this case any scheduler contact will be rejected. But in this case, it will not be able to report even the results it already has anymore.
ID: 26692 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 1,199
Message 26698 - Posted: 19 Jul 2014, 13:08:54 UTC - in response to Message 26692.  
Last modified: 19 Jul 2014, 13:12:05 UTC

Afraid I don't get your maths.

2% threshold?? That's a ridiculously low limit!! wth?? Makes the quota nearly utterly pointless!

Re the 10137504 host, the problem is with that machine & the guy needs to sort it out, I hadn't realised it was the same host returning duds from 2 months ago!!

I sent him a (polite) PM a few days ago but I don't know if aqvario speaks English seeing as he's Polish.

Anyone here speak Polish?
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26698 · Report as offensive     Reply Quote
waveybarrel

Send message
Joined: 15 Oct 13
Posts: 6
Credit: 28,625
RAC: 0
Message 26704 - Posted: 21 Jul 2014, 10:07:40 UTC - in response to Message 26698.  

Try this from Google Translate:

Your machine is causing a lot of problems for the LHC@home community. Please can you sort it out or disconnect it from the project? Thank you.

Urządzenie powoduje wiele problemów dla społeczności LHC @ home. Proszę można sortować ją lub odłącz go od projektu? Dziękuję.

Hope it's not obscene.
ID: 26704 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 1,199
Message 26706 - Posted: 21 Jul 2014, 17:55:11 UTC - in response to Message 26704.  
Last modified: 21 Jul 2014, 18:04:01 UTC

Yea that was going to be my next option, but it is a 2nd choice.

Slightly modded it to this :-

Hi
Your PC 10137504 is creating a lot of invalid results & is causing problems for the LHC@home community. Please can you sort it out or disconnect that machine from the project.
Translated by Google, apologies for any errors!

Thank you.

Cześć
Komputer 10137504 tworzy wiele nieprawidłowych wyników i powoduje problemy dla społeczności LHC @ home. Proszę można rozwiązać to, że urządzenie lub odłączyć od projektu.
Tłumaczone przez Google, przepraszam za jakiekolwiek błędy!

Dziękuję.

******************************************************

And I'll add a link to here.
PM sent.
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26706 · Report as offensive     Reply Quote
Yacob

Send message
Joined: 1 Dec 12
Posts: 11
Credit: 5,844,526
RAC: 0
Message 26707 - Posted: 21 Jul 2014, 18:34:56 UTC - in response to Message 26706.  

Perfect,
thanks Assimilator.
Let's see if he check the correspondence...

Yacob

ID: 26707 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 1,199
Message 26708 - Posted: 21 Jul 2014, 19:07:29 UTC - in response to Message 26707.  

Perfect translation?
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26708 · Report as offensive     Reply Quote
Yacob

Send message
Joined: 1 Dec 12
Posts: 11
Credit: 5,844,526
RAC: 0
Message 26709 - Posted: 21 Jul 2014, 22:07:50 UTC - in response to Message 26708.  

Not at all, haha.
I meant: "perfect" for your work and effort :)
I guess the translation is as good as any other Google Translate translation.



ID: 26709 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 1,199
Message 26710 - Posted: 22 Jul 2014, 5:37:51 UTC - in response to Message 26709.  

Ah ok :)
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26710 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,971,239
RAC: 1,720
Message 26958 - Posted: 8 Nov 2014, 7:06:35 UTC - in response to Message 26637.  
Last modified: 8 Nov 2014, 7:11:03 UTC

Looks like same host is at it again and causing problems again. I have 2 tasks with validation inconclusive where this host is my wingman. One task in which the other host spent no CPU time at all, and another task that it "completed" in less than 1 second while I spent 1937 seconds on my Intel i7-3770.

As of now, he has 2 valid results, 1337 pending, and 4249 inconclusive. I suspect that he will have errors in all those inconclusive results.

Can somebody please block this guy from messing up results or block him from getting any new tasks.

10137504 currently has 13152 inconclusive results and 16 valid ones.
ID: 26958 · Report as offensive     Reply Quote
TomTom
Avatar

Send message
Joined: 13 Sep 14
Posts: 6
Credit: 444,724
RAC: 0
Message 26962 - Posted: 8 Nov 2014, 11:09:57 UTC

Hi jelle.
Same problem with the same wingman. But don't worry, normally the WU will be send later to another wingman in order to confirm the correct result.
Maybe the server managers uses a theshold of invalid results to ban a user?
WU's on first
ID: 26962 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 5,971,239
RAC: 1,720
Message 26965 - Posted: 8 Nov 2014, 11:43:14 UTC - in response to Message 26962.  

I do expect the tasks to eventually be validated with another wingman, so I'm not worried about that.

The strange thing is that the owner of the malfunctioning computer has 3 other machines that seem to be crunching away properly and getting good results. His total credit for LHC@home is 2,354,637; which is more than double what I have and entirely respectable. It's only 10137504 that throws off all the errors.
ID: 26965 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,623,712
RAC: 4
Message 27063 - Posted: 18 Jan 2015, 23:38:51 UTC

Here's another host messing a lot of work : http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9996388 :(
ID: 27063 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 27064 - Posted: 19 Jan 2015, 16:19:21 UTC

I have contacted "aqvario". It must be said this
host also delivers a lot of valid results.
I guess we are suffering because the 3rd and later
attempts to rerun the case are going to the back of
a (very) long queue. Eric.
ID: 27064 · Report as offensive     Reply Quote
Darren Jones

Send message
Joined: 22 Aug 09
Posts: 5
Credit: 192,011
RAC: 0
Message 27065 - Posted: 19 Jan 2015, 16:38:03 UTC - in response to Message 27063.  

Here's another host messing a lot of work : http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9996388 :(


Just came here to post the same

Darren
ID: 27065 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 27066 - Posted: 19 Jan 2015, 19:34:23 UTC - in response to Message 27065.  

OK thanks; contacting him as well. Eric.
ID: 27066 · Report as offensive     Reply Quote
antti

Send message
Joined: 2 Sep 04
Posts: 4
Credit: 867,126
RAC: 0
Message 27068 - Posted: 19 Jan 2015, 20:12:15 UTC

Would this help with the faulty hosts?

<daily_result_quota> N </daily_result_quota>
Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts.
ID: 27068 · Report as offensive     Reply Quote
Darren Jones

Send message
Joined: 22 Aug 09
Posts: 5
Credit: 192,011
RAC: 0
Message 27077 - Posted: 22 Jan 2015, 19:54:14 UTC

over 14,000 inconclusive results for the host now :/
ID: 27077 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 27086 - Posted: 28 Jan 2015, 12:28:46 UTC

Well up to almost 38,000 inconclusive on my side.
I'll try and do something about this! Eric.
ID: 27086 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 71
Credit: 1,789,447
RAC: 0
Message 27119 - Posted: 31 Jan 2015, 18:13:06 UTC

Could this be one of the reasons the upload server filled up? Normally, when work units are successfully validated, the invalid results should be marked as invalid so that the invalid results' files can be deleted. It seems that your validator fails to mark the invalid results. Results marked as validation inconclusive generally have to stay in storage so that they can be compared to other results so that they can be validated against the resends. Changing them to the invalid state clears them for deletion.
ID: 27119 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : Number crunching : Host messing up tons of results


©2020 CERN