Message boards :
Sixtrack Application :
Inconclusive, valid/invalid results
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 455 Credit: 202,053,412 RAC: 48,818 |
|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Yes it is being done in the new SixTrack under test. This will be an enormous help in identifying problems and doing correct validation. HOWEVER as I have said many times to deaf ears this is NOT the cause of all the "transient" errors which created our current problems. Still, no consensus is down towards 330,000 now and we have over 700,000 validated. Eric Is it possible to do the following workaround? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
There are three, 3, fundamental problems that we are addressing with priority. 1. Why were Tasks not being distributed to the volunteers during the Pentathlon and subsequently too many tasks distributed to a volunteer? 2. Where did the 399,219 transient failures come from between 2017-06-25 02:35:53.4879 and 1st July. In addition there are 399,233 matching "couldn't open" messages for the result files. The transients have disappeared since 4th July and we have only 150 (new message) try_open failures during this period. 3. Why the inadequate configuration of upload/download directorie for ALL the LHC@home subprojects? Possible cause of transient failures and making debugging impossibly slow. Several fixes and improvemnts have been made to the sixtrack_validator. The new Sixtrack resolving issues around empty result files and inluding support for AVX and MacOS amongst many other things is under test. Anyway, Inconclusive/No Consensus are down to 20,734 last 24 hours and to 322,565 in the last seven days. Yet more patience is required. Eric. |
Send message Joined: 18 Sep 04 Posts: 30 Credit: 5,100,929 RAC: 0 |
Is this topic related to the fact that the server has cancelled 8 tasks on 3 of my well-performing machines yesterday? Michael. |
Send message Joined: 2 May 07 Posts: 2246 Credit: 174,088,323 RAC: 7,135 |
Have also eight canceled tasks by server yesterday. Saw that the quorum was two, so the third task was obsolete. |
Send message Joined: 14 Jan 10 Posts: 1432 Credit: 9,595,867 RAC: 4,807 |
Have also eight canceled tasks by server yesterday. I also had one cancelled, but strange is, that the original 2 were returned on the 20th and 25th of June and the resend to me was send on the 7th of July and a few hours later cancelled. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71160885 Was the SixTrack-validator so far behind? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
With respect to cancelled tasks. I don't feel it is a problem. Already got a validated result, so not needed I suspect. There are plenty more to run. Also I believe too many tasks were distributed in in response to the "WUs not being distributed problem". I really think it will all sort itself out now but I suspect there may be some tasks which will be sent 5 times and never validated. They will all in this case be very short though. This will be fixed in the next SixTrack Release. Patience, patience and I just pray that our transient errors are gone for good. Indeed the sixtrack_validator is way behind due to th e300,00 or so "transient" errors a couple of weeks ago. quote] Have also eight canceled tasks by server yesterday.rtic. I also had one cancelled, but strange is, that the original 2 were returned on the 20th and 25th of June and the resend to me was send on the 7th of July and a few hours later cancelled. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=71160885 Was the SixTrack-validator so far behind?[/quote] |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Yes, but not a problem I hope. Eric Is this topic related to the fact that the server has cancelled 8 tasks on 3 of my well-performing machines yesterday? |
Send message Joined: 2 May 07 Posts: 2246 Credit: 174,088,323 RAC: 7,135 |
realy good things need time to grow ;-). |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well in this case I think the incubation period is a bit long! Eric realy good things need time to grow ;-). |
Send message Joined: 15 Jun 08 Posts: 2561 Credit: 256,682,696 RAC: 105,774 |
Well in this case I think the incubation period is a bit long! Eric This can only result in one conclusion: This "thing" will become really good. ;-) |
Send message Joined: 2 May 07 Posts: 2246 Credit: 174,088,323 RAC: 7,135 |
Eric, is this possible - more than 40k sixtrack-tasks with such a small number of successful tasks: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10388131 |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Sadly I believe this is because of our over 300,000 infamous transient errors. Looking at a couple of stderr I do not see any xml and we get nothing but an empty result file which is rejected. However for at least one Work Unit I see 148257294 10486162 24 Jun 2017, 11:52:54 UTC 24 Jun 2017, 11:58:00 UTC Validate error 3.16 2.21 --- SixTrack v451.07 (sse2) x86_64-pc-linux-gnu 148257295 10484663 24 Jun 2017, 11:55:00 UTC 24 Jun 2017, 11:57:42 UTC Completed, validation inconclusive 19.24 17.50 pending SixTrack v451.07 (pni) x86_64-pc-linux-gnu 149752326 10138935 4 Jul 2017, 9:06:50 UTC 8 Jul 2017, 6:22:39 UTC Completed, validation inconclusive 76,873.79 62,879.10 pending SixTrack v451.07 (sse2) windows_x86_64 150542647 10388131 8 Jul 2017, 6:25:16 UTC 8 Jul 2017, 6:34:07 UTC Completed, validation inconclusive 111.25 108.87 pending SixTrack v451.07 (sse2) i686-pc-linux-gnu 150542796 10476113 8 Jul 2017, 6:35:06 UTC 15 Jul 2017, 22:07:20 UTC In progress --- --- --- SixTrack v451.07 (pni) windows_x86_64 The result with some 60,000 seconds should eventually be validated, if we are lucky, but we may exhaust the maximum of 5 attempts! More likely we are running into another Task Management problem giving us a null result, but from a science standpoint that is much much better than validating two duds! Also if a volunteer loses patience and aborts a task I think it counts as 1 of our 5...... This is all pretty horrible, but it is in the lap of the gods, or at least in the hands of my colleagues. Even worse the WWW response times are so bad that I can't easily investigate further. I don't believe any particular host or hosts are responsible, although we surely have some "bad" hosts....... I might be able in the future to analyse the past couple of weeks and even compensate long running results which were wrongly invalidated. Best I can do for now. Down to 5.462 Inconclusive for last 24 hours, but still have 312,734 for the last seven days to be cleared out. Eric. Eric, |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
P.S This host appears to have 40,353 results!!! I'll sleep on it. I shall look at the host again tomorrow as there seem to be a huge number of tasks with very very short run times......... Eric, |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well, in spite of many other things today, I have somewhat progressed but I am not at the end by any means. I have indeed found 38,477 tasks/results for this host. They "all" appear to be incredibly short (impossible for me) but also appear to have returned a "success" result but which cannot be validated!!! I continue checking and I am trying to find these results in the BOINC server upload directory, or at least some of them. I don't see how this can happen really...but I'll find out. The tasks/results do not appear to be part of a particular workspace/study. The host appears not too special (Linux). We shall see. In the meantime we are down to 3517, down 10% in some hours, Inconclusive/No consensus for last 24hrs and 219,481 for last 7 days. Slow but progressing. If I can find this it will be a breakthrough. Very useful feedback. Aborted by user 98,116 doesn't help, but is a user privilege. May also avoid wasting your host's CPU time. Eric. P.S This host appears to have 40,353 results!!! |
Send message Joined: 6 Sep 13 Posts: 5 Credit: 1,286,288 RAC: 0 |
I ran a few WUs and they ran fine on Ubuntu-based Linux Mint. However, it was in Virtualbox guest OS and not bare metal. I did notice some WUs had 5 people crunching them... |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I ran a few WUs and they ran fine on Ubuntu-based Linux Mint. However, it was in Virtualbox guest OS and not bare metal. Yes; this a clear indication of a problem. Any chance of naming a few.... I am missing something here....I must be able to find them myself from some field in the WU table in the database. Thanks. Eric. |
Send message Joined: 2 May 07 Posts: 2246 Credit: 174,088,323 RAC: 7,135 |
AlphaC, please can you open your Computer-List. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
AlphaC, This "opening" would possibly be a great help. (No need to be shy about member of Overclock.net :-) Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
AlphaC, I don't think it matters, I found info in our Database. I DID find a big source of Inconclusive/Invalid results which is NOT Linux Kernel 4.8.0 nor AlphaC . Very encouraging. Will post again soonest. Eric. |
©2025 CERN