Message boards : Number crunching : Host messing up tons of results
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 9 · Next

AuthorMessage
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 26637 - Posted: 9 Jul 2014, 7:56:20 UTC
Last modified: 9 Jul 2014, 7:58:20 UTC

10137504 currently has 13152 inconclusive results and 16 valid ones.
ID: 26637 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 378,136,779
RAC: 32,173
Message 26641 - Posted: 9 Jul 2014, 23:51:35 UTC

That computer was the same before, I believe someone was going to talk to the owner.
ID: 26641 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,623,712
RAC: 0
Message 26642 - Posted: 10 Jul 2014, 10:31:35 UTC
Last modified: 10 Jul 2014, 10:31:45 UTC

I have two WUs in validation inconclusive because of this host too ...

Someone should tell him about these errors :(
ID: 26642 · Report as offensive     Reply Quote
Dennis

Send message
Joined: 10 Sep 08
Posts: 6
Credit: 6,333,429
RAC: 5
Message 26648 - Posted: 13 Jul 2014, 8:45:02 UTC

Now 21303 inconclusive - ridiculous!
ID: 26648 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 0
Message 26650 - Posted: 13 Jul 2014, 12:20:50 UTC - in response to Message 26648.  

Yea! Weird thing is he has 0 errored atm!
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26650 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 26652 - Posted: 13 Jul 2014, 12:59:22 UTC

I'll look at this tomorrow. My IT support has some new
scripts (I hope) for monitoring errors. In any case I
shall turn off this host if that is the solution. Eric.


ID: 26652 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 26653 - Posted: 13 Jul 2014, 13:03:26 UTC

......and I have just realised that since we are "all" waiting for
a 3rd run to get validation.......these additonal runs seem to
be scheduled at end of the one million job queue!!!!!
This could explain a lot, especially why the return of results
seems slow. Eric.
ID: 26653 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 0
Message 26654 - Posted: 13 Jul 2014, 14:00:22 UTC
Last modified: 13 Jul 2014, 14:29:20 UTC

I've just realised I've got 6 WUs held up by this host!

Looking at his times it's now obvious to me they will be errored results.

I've just looked through my pending results & found a user's PC with every single of its 478 tasks errored! This 1 http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9973913 1 of rhurlin's PCs.
And all but maybe 1 of this ones! http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=10200322 (Kevin Arth)
And all of this ones! http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=10313550 ([AF>Libristes>Gentoo]JujuBickoille)
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26654 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 26662 - Posted: 14 Jul 2014, 11:52:53 UTC - in response to Message 26654.  

Your results will not be invalidated, if correct........the other ones will be. Eric.
ID: 26662 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 0
Message 26666 - Posted: 14 Jul 2014, 17:32:25 UTC - in response to Message 26662.  

Roger that, just some point latter on I assume?
Was mainly just pointing out some other dodgy hosts.
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26666 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 378,136,779
RAC: 32,173
Message 26671 - Posted: 16 Jul 2014, 0:14:42 UTC - in response to Message 26653.  

You can fix that Eric if you change the settings that we talked about in the other thread, accelerate re-tries I think it was called?
ID: 26671 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 185
Credit: 3,297,428
RAC: 0
Message 26674 - Posted: 16 Jul 2014, 10:18:58 UTC - in response to Message 26671.  

You can fix that Eric if you change the settings that we talked about in the other thread, accelerate re-tries I think it was called?

And as I said in that other thread - message 26567 - I don't think that accelerating retries would help bring additional tasks forward from the end of the queue (that's where they go), but it would help to make sure they're dealt with quickly and effectively when we do reach them.

At least the B1 injection run seems to have a low average runtime, so the queue is shrinking rapidly.
ID: 26674 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 252
Credit: 11,225,577
RAC: 2
Message 26676 - Posted: 16 Jul 2014, 19:49:22 UTC
Last modified: 16 Jul 2014, 19:55:35 UTC

I'm starting to see a few ..job_corr_bb..... _2 and _3 resends now so maybe we've got to the tail, of that batch anyway. I would hope that the resends come from the tail of each study rather than going all the way to the end of the whole queue.

(These are from ordinary errors, not inconclusives from that rogue host, but might be a sign that we will soon be working through those.)
ID: 26676 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 0
Message 26677 - Posted: 16 Jul 2014, 22:08:36 UTC
Last modified: 16 Jul 2014, 22:16:40 UTC

That host now has 'Validation inconclusive (27935)'!
You'd think he/she would notice the huge dearth in points by now!

I just sent him a PM, see if he replies.....
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26677 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 378,136,779
RAC: 32,173
Message 26678 - Posted: 17 Jul 2014, 1:16:49 UTC - in response to Message 26674.  

Sorry I thought it put them at the front of queue
ID: 26678 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 252
Credit: 11,225,577
RAC: 2
Message 26679 - Posted: 17 Jul 2014, 18:05:15 UTC
Last modified: 17 Jul 2014, 18:08:53 UTC

One would have hoped that any wus requiring to be resent would go to the front of the queue but my earliest invalids have been waiting since 2nd July and have still not been resent to a third contributor.

Current batch of longer-running tasks (although good to see) might slow up the resends even more. I had a 15 hour job overnight.
ID: 26679 · Report as offensive     Reply Quote
Yacob

Send message
Joined: 1 Dec 12
Posts: 11
Credit: 5,844,526
RAC: 0
Message 26680 - Posted: 17 Jul 2014, 19:33:40 UTC - in response to Message 26679.  

I can confirm:
I have 55 tasks waiting as "validation inconclusive", most of them if not all from host 10137504. The first of them has been waiting since 2nd of July too.
ID: 26680 · Report as offensive     Reply Quote
Yacob

Send message
Joined: 1 Dec 12
Posts: 11
Credit: 5,844,526
RAC: 0
Message 26681 - Posted: 17 Jul 2014, 19:33:41 UTC - in response to Message 26679.  

I can confirm:
I have 55 tasks waiting as "validation inconclusive", most of them if not all from host 10137504. The first of them has been waiting since 2nd of July too.
ID: 26681 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 47
Credit: 3,668,448
RAC: 0
Message 26682 - Posted: 17 Jul 2014, 20:07:50 UTC - in response to Message 26662.  

Hi Eric, that host 10137504 (not the user) is becoming a menace, can't you cut it off?

Having asked that, it seems to only have 3 WUs in progress now........
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm ProACC, 32GB DDR4 3200, RX580 8GB, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win 7 64bit
ID: 26682 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 26687 - Posted: 18 Jul 2014, 14:36:23 UTC - in response to Message 26682.  

I have e-mailed and we shall see.
Situation now very complicated due to
"new" multiple "ERR_RESULT_DOWLOAD" errors.
I am watching over weekend and we shall see.
Eric.
ID: 26687 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 9 · Next

Message boards : Number crunching : Host messing up tons of results


©2020 CERN