Message boards :
Number crunching :
Host messing up tons of results
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
Author | Message |
---|---|
Send message Joined: 29 Nov 13 Posts: 59 Credit: 4,012,100 RAC: 0 |
Yea I see I have 3 of those now, how can downloading a WU fail?? ;) Team AnandTech - WCG, Uni@H, F@H, MW@H, Ast@H, LHC@H, R@H, CPDN, E@H. Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RTX 3060 Ti 8GB, Win10 64bit 2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64 |
Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
The problem with host 10137504 lays in BOINC itself, the server side BOINC software does not really reduce the host's daily quota unless it had less than 2%(!!!) * valid results. But host 10137504 does return a valid result now and then. I have reported this problem in several projects that had a similar problem but it seems not to be fixed. * The quota works like this : Invalid => Quota -= 1 Valid => Quota *= 2 but would better be : Invalid => Quota /= 2 Valid => Quota += 1 You can exclude a host completely by setting the quota to -1 by hand, in this case any scheduler contact will be rejected. But in this case, it will not be able to report even the results it already has anymore. |
Send message Joined: 29 Nov 13 Posts: 59 Credit: 4,012,100 RAC: 0 |
Afraid I don't get your maths. 2% threshold?? That's a ridiculously low limit!! wth?? Makes the quota nearly utterly pointless! Re the 10137504 host, the problem is with that machine & the guy needs to sort it out, I hadn't realised it was the same host returning duds from 2 months ago!! I sent him a (polite) PM a few days ago but I don't know if aqvario speaks English seeing as he's Polish. Anyone here speak Polish? Team AnandTech - WCG, Uni@H, F@H, MW@H, Ast@H, LHC@H, R@H, CPDN, E@H. Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RTX 3060 Ti 8GB, Win10 64bit 2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64 |
Send message Joined: 15 Oct 13 Posts: 6 Credit: 28,625 RAC: 0 |
Try this from Google Translate: Your machine is causing a lot of problems for the LHC@home community. Please can you sort it out or disconnect it from the project? Thank you. Urządzenie powoduje wiele problemów dla społeczności LHC @ home. Proszę można sortować ją lub odłącz go od projektu? Dziękuję. Hope it's not obscene. |
Send message Joined: 29 Nov 13 Posts: 59 Credit: 4,012,100 RAC: 0 |
Yea that was going to be my next option, but it is a 2nd choice. Slightly modded it to this :- Hi Your PC 10137504 is creating a lot of invalid results & is causing problems for the LHC@home community. Please can you sort it out or disconnect that machine from the project. Translated by Google, apologies for any errors! Thank you. Cześć Komputer 10137504 tworzy wiele nieprawidłowych wyników i powoduje problemy dla społeczności LHC @ home. Proszę można rozwiązać to, że urządzenie lub odłączyć od projektu. Tłumaczone przez Google, przepraszam za jakiekolwiek błędy! Dziękuję. ****************************************************** And I'll add a link to here. PM sent. Team AnandTech - WCG, Uni@H, F@H, MW@H, Ast@H, LHC@H, R@H, CPDN, E@H. Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RTX 3060 Ti 8GB, Win10 64bit 2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64 |
Send message Joined: 1 Dec 12 Posts: 11 Credit: 5,844,526 RAC: 0 |
Perfect, thanks Assimilator. Let's see if he check the correspondence... Yacob |
Send message Joined: 29 Nov 13 Posts: 59 Credit: 4,012,100 RAC: 0 |
Perfect translation? Team AnandTech - WCG, Uni@H, F@H, MW@H, Ast@H, LHC@H, R@H, CPDN, E@H. Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RTX 3060 Ti 8GB, Win10 64bit 2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64 |
Send message Joined: 1 Dec 12 Posts: 11 Credit: 5,844,526 RAC: 0 |
Not at all, haha. I meant: "perfect" for your work and effort :) I guess the translation is as good as any other Google Translate translation. |
Send message Joined: 29 Nov 13 Posts: 59 Credit: 4,012,100 RAC: 0 |
Ah ok :) Team AnandTech - WCG, Uni@H, F@H, MW@H, Ast@H, LHC@H, R@H, CPDN, E@H. Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RTX 3060 Ti 8GB, Win10 64bit 2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64 |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 44 |
Looks like same host is at it again and causing problems again. I have 2 tasks with validation inconclusive where this host is my wingman. One task in which the other host spent no CPU time at all, and another task that it "completed" in less than 1 second while I spent 1937 seconds on my Intel i7-3770. As of now, he has 2 valid results, 1337 pending, and 4249 inconclusive. I suspect that he will have errors in all those inconclusive results. Can somebody please block this guy from messing up results or block him from getting any new tasks. 10137504 currently has 13152 inconclusive results and 16 valid ones. |
Send message Joined: 13 Sep 14 Posts: 6 Credit: 444,724 RAC: 0 |
Hi jelle. Same problem with the same wingman. But don't worry, normally the WU will be send later to another wingman in order to confirm the correct result. Maybe the server managers uses a theshold of invalid results to ban a user? WU's on first |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 44 |
I do expect the tasks to eventually be validated with another wingman, so I'm not worried about that. The strange thing is that the owner of the malfunctioning computer has 3 other machines that seem to be crunching away properly and getting good results. His total credit for LHC@home is 2,354,637; which is more than double what I have and entirely respectable. It's only 10137504 that throws off all the errors. |
Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,671,357 RAC: 0 |
Here's another host messing a lot of work : http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9996388 :( |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I have contacted "aqvario". It must be said this host also delivers a lot of valid results. I guess we are suffering because the 3rd and later attempts to rerun the case are going to the back of a (very) long queue. Eric. |
Send message Joined: 22 Aug 09 Posts: 5 Credit: 192,011 RAC: 0 |
Here's another host messing a lot of work : http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9996388 :( Just came here to post the same Darren |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
OK thanks; contacting him as well. Eric. |
Send message Joined: 2 Sep 04 Posts: 4 Credit: 867,126 RAC: 0 |
Would this help with the faulty hosts? <daily_result_quota> N </daily_result_quota> Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts. |
Send message Joined: 22 Aug 09 Posts: 5 Credit: 192,011 RAC: 0 |
over 14,000 inconclusive results for the host now :/ |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well up to almost 38,000 inconclusive on my side. I'll try and do something about this! Eric. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
Could this be one of the reasons the upload server filled up? Normally, when work units are successfully validated, the invalid results should be marked as invalid so that the invalid results' files can be deleted. It seems that your validator fails to mark the invalid results. Results marked as validation inconclusive generally have to stay in storage so that they can be compared to other results so that they can be validated against the resends. Changing them to the invalid state clears them for deletion. |
©2024 CERN