Message boards : Sixtrack Application : Inconclusive, valid/invalid results
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31058 - Posted: 26 Jun 2017, 0:13:48 UTC - in response to Message 31057.  

Thanks for the feedback. Very useful. More news soonest. Eric.
ID: 31058 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31059 - Posted: 26 Jun 2017, 1:33:44 UTC - in response to Message 31058.  

Working Day and Night on this.
Problem seems to be Ubuntu Kernel/Version 4.8.0 specific.
(There may be other reasons for empty result files though.
They have been masked until now.) . Eric.
ID: 31059 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31064 - Posted: 26 Jun 2017, 6:18:59 UTC

Dear Volunteer, this morning 26th June at 08:00 CEST (06:00 CST + 2)
I BANNED 17 Hosts from new work/Tasks (2 had already been banned).
These 17 Linux Ubuntu systems 4.8.0 have been consistently producing empty
result files with Exit Code 59. Hence the long delays in Validation and the
large number of Inconclusive. I believe this will avoid further delays.
I am now trying to identify the reason for the Exit Code 59 produced on
these hosts. A simple problem but very very difficult to identify!
(If any of these 17 host owners, have the time and the inclination, I
could send various tests, while I set up my own Ubuntu system.

I shall try and e-mail the individual owners.

I shall discuss with colleagues the suspension of WU submission until this
backlog is cleared.

It remains to be seen if there are further sources of empty result files where I
strongly suspect a Client/Server problem on Windows 10.

Thanks again for your patience and tolerance. This is a pretty complex system
involving many elements and factors, but not compared to the LHC operation itself! :-)
Eric.
ID: 31064 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,792,290
RAC: 82,451
Message 31070 - Posted: 26 Jun 2017, 7:27:45 UTC

If this host is not already included in the banlist it may be a candidate:
hostid=10486290
ID: 31070 · Report as offensive     Reply Quote
Profile John Hunt

Send message
Joined: 13 Jul 05
Posts: 133
Credit: 162,641
RAC: 0
Message 31072 - Posted: 26 Jun 2017, 8:51:20 UTC

Is my host 10414945 OK? Lot of pending and inconclusive results here....
ID: 31072 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,792,290
RAC: 82,451
Message 31074 - Posted: 26 Jun 2017, 9:26:46 UTC - in response to Message 31072.  

Is my host 10414945 OK? Lot of pending and inconclusive results here....

If I understand Eric's comments correctly you may do the following:
- check your invalid results. If 0 (or close to 0) your computer is most likely ok
- if you have a high number of inconclusive results, ceck your wingmen's computer
- if your wingmen's computer have a lot of invalid results their computer is most likely a candidate for the banlist

Your inconclusives will be resend to another computer and you will get credits once they are validated.
Pending does not indicate an error. Be patient.
ID: 31074 · Report as offensive     Reply Quote
Profile John Hunt

Send message
Joined: 13 Jul 05
Posts: 133
Credit: 162,641
RAC: 0
Message 31078 - Posted: 26 Jun 2017, 11:24:37 UTC - in response to Message 31074.  

Thx for your help and advice! On checking the details of my 20 inconclusives, they are split amongst 7 different wingmen, all with high proportions of inconclusives.
ID: 31078 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,792,290
RAC: 82,451
Message 31080 - Posted: 26 Jun 2017, 11:44:14 UTC - in response to Message 31078.  

... amongst 7 different wingmen, all with high proportions of inconclusives.

Of course, but the more interesting value would be the wingmen's invalid rate.
ID: 31080 · Report as offensive     Reply Quote
[AF>HFR>MPT] Pink_Floyd

Send message
Joined: 6 Jan 08
Posts: 3
Credit: 526,002
RAC: 0
Message 31083 - Posted: 26 Jun 2017, 11:51:32 UTC

bonjour, j'ai ce soucis apparemment.



que dois-je faire ?
ID: 31083 · Report as offensive     Reply Quote
Proger

Send message
Joined: 25 Jan 06
Posts: 1
Credit: 280,392
RAC: 0
Message 31084 - Posted: 26 Jun 2017, 12:27:21 UTC - in response to Message 31083.  

bonjour, j'ai ce soucis apparemment.
...
que dois-je faire ?


Pour résumer les messages du modérateur :
- Suite à des machines renvoyant systématiquement et massivement des résultats faux (des linux) il y a une grande file d'attente en validation peu concluante
- Patientez. Vos unités de travail seront renvoyées à d'autres machines pour recalcul et validation.
- L'attente peut être longue car ces unités sont en bas de la file d'envoi du serveur.

summarized trans : wait for a resend to another host of the WU / see msg 31064 and 31074
ID: 31084 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 7 May 17
Posts: 10
Credit: 6,952,848
RAC: 236
Message 31086 - Posted: 26 Jun 2017, 13:08:27 UTC
Last modified: 26 Jun 2017, 13:19:09 UTC

I have doubts that there are special hosts or special OSs causing the high rate of inconclusive results.

Why? Because the rate of inconclusive results is >99.9 % as far as I can see.

My current sixtrack stats:
all (9502)
in progress (755)
validation pending (1708)
validation inconclusive (4527)
valid (1256) --- only 3 after the validator changed, see below
invalid (1)
error (1255)

Comments:

Before the validator change, I think I had only a handful inconclusive results. Browsing through my >4500 current inconclusives, they seem to be all from after the validator change.

The single invalid one is WU 69714861: 2x cancelled + 3x finished but with different results according to the validator (completed on June 3, June 10, and June 25, i.e. 2x with old validator and 1x with new validator). Therefore this invalid task really is more like inconclusive, because there were two guys who cancelled, and it remains unknown which of the three submitted results was the right one.

The errors are some user-aborted tasks, but typically "finish file present too long" errors.

Of the valid tasks, only 3 (three) have been validated by the new validator. All others had been validated before the new validator was brought online.

(BTW, all of my boxes are Xeon E5 and Xeon E3, all but one with ECC RAM, and they had earned my trust in their results before. Some of them are purpose-built compute nodes for engineering applications, doing Distributed Computing in downtimes. --- Edit: These are Linux boxes, except one Windows box which shows exactly the same picture as the Linux boxes.)
ID: 31086 · Report as offensive     Reply Quote
morgan

Send message
Joined: 1 Nov 12
Posts: 3
Credit: 555,769
RAC: 0
Message 31095 - Posted: 26 Jun 2017, 16:06:18 UTC - in response to Message 31086.  
Last modified: 26 Jun 2017, 16:14:00 UTC

Same her!
my last validated was on 24jun, date of validater change....
The wu`s start with Pending(some of them), befor they move on to Validation inconclusiv

Before i had maybe 5 or 6, but today the inconclusive task has increase to 96!
Going back to Theory simulation
ID: 31095 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 7 May 17
Posts: 10
Credit: 6,952,848
RAC: 236
Message 31097 - Posted: 26 Jun 2017, 16:48:28 UTC - in response to Message 31086.  

PS:
Spot checks through my >4500 (and rising) inconclusive results show that I and the wingman both completed with exist status 0. So far I have not seen a single WU with non-zero exit code in my or the wingman's task. (IOW I have nowhere seen the exit code 59 which is mentioned in message 31064.)
ID: 31097 · Report as offensive     Reply Quote
[AF>HFR>MPT] Pink_Floyd

Send message
Joined: 6 Jan 08
Posts: 3
Credit: 526,002
RAC: 0
Message 31098 - Posted: 26 Jun 2017, 17:12:40 UTC - in response to Message 31084.  

bonjour, j'ai ce soucis apparemment.
...
que dois-je faire ?


Pour résumer les messages du modérateur :
- Suite à des machines renvoyant systématiquement et massivement des résultats faux (des linux) il y a une grande file d'attente en validation peu concluante
- Patientez. Vos unités de travail seront renvoyées à d'autres machines pour recalcul et validation.
- L'attente peut être longue car ces unités sont en bas de la file d'envoi du serveur.

summarized trans : wait for a resend to another host of the WU / see msg 31064 and 31074

ok , donc j'ai rien à faire de particulier.
merci pour ta réponse. :)
ID: 31098 · Report as offensive     Reply Quote
nairb

Send message
Joined: 1 May 07
Posts: 13
Credit: 1,150,687
RAC: 635
Message 31099 - Posted: 26 Jun 2017, 17:51:49 UTC

I have 2 computers working on six track
1 is an old dual xeon machine running fedora core 10 and has Validation inconclusive (20) out of 41 tasks.
2 is an I5 processor running windows10 and has Validation inconclusive (99) out of 189 tasks with 56 still waiting to be processed. There are only 16 valid.

It would seem that old linux with old cpu's has the same issues as an I5 cpu with the latest windows 10.
ID: 31099 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31102 - Posted: 26 Jun 2017, 18:05:00 UTC

HIGHLY IMPORTANT
It looks as if my attempted fix is creating too many problems, MEA CULPA.
I expect we shall shortly withdraw it and return to the previous version of
the sixtrack_validator. Sorry for the hassle and I shall try and answer as many
of your posts to this thread as possible. The side effect seems to be I/we are
now NOT validating when we should, certainly my fault.

We shall return to a state where we will validate invalid empty results, but in this
case the attempted Cure is Worse than the Disease. ( I tried.)

Hope to try again soon. Eric.
ID: 31102 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31104 - Posted: 26 Jun 2017, 18:09:19 UTC - in response to Message 31099.  

No, I am very sure not. I created another problem.
Hope all will be well soon. Eric.
ID: 31104 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31105 - Posted: 26 Jun 2017, 18:10:24 UTC - in response to Message 31098.  

Oui, mais je me suis trompee. Vous n'avez rien a faire. Eric
ID: 31105 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31106 - Posted: 26 Jun 2017, 18:11:46 UTC - in response to Message 31097.  

Indeed you would NOT see the -59. I created another problem
which we shall fix soonest.
ID: 31106 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31107 - Posted: 26 Jun 2017, 18:12:35 UTC - in response to Message 31095.  

Understood. Eric.
ID: 31107 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : Sixtrack Application : Inconclusive, valid/invalid results


©2020 CERN