Message boards : Sixtrack Application : Inconclusive, valid/invalid results
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Demis

Send message
Joined: 6 Mar 12
Posts: 7
Credit: 3,130,938
RAC: 19
Message 32146 - Posted: 30 Aug 2017, 9:14:24 UTC

ID: 32146 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,029
RAC: 3,972
Message 32147 - Posted: 30 Aug 2017, 9:51:22 UTC

I got 48 WU's and they all finished fairly fast. (ave. 1 hour)

32 Valid
15 Pending
1 Invalid
Volunteer Mad Scientist For Life
ID: 32147 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 32148 - Posted: 30 Aug 2017, 12:23:56 UTC - in response to Message 32146.  

Hello Demis and computezrmle,

apologizes for that. Those WUs had a piece of input (including advanced settings) which can be correctly interpreted by sixtracktest, but not by sixtrack - the difference between the two being extensions to physics and user interfaces. The user who submitted that work simply forgot to specify submission to sixtracktest.

The WUs belonging to those studies have been deleted, so that we do not waist further resources.

Thanks a lot in advance for your understanding,
ID: 32148 · Report as offensive     Reply Quote
Demis

Send message
Joined: 6 Mar 12
Posts: 7
Credit: 3,130,938
RAC: 19
Message 32159 - Posted: 31 Aug 2017, 10:08:39 UTC - in response to Message 32148.  

Hello Demis and computezrmle,

apologizes for that. Those WUs had a piece of input (including advanced settings) which can be correctly interpreted by sixtracktest, but not by sixtrack - the difference between the two being extensions to physics and user interfaces. The user who submitted that work simply forgot to specify submission to sixtracktest.

The WUs belonging to those studies have been deleted, so that we do not waist further resources.

Thanks a lot in advance for your understanding,

Ok. This is semaphore only. Thank you.
ID: 32159 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,141,325
RAC: 105,216
Message 33127 - Posted: 24 Nov 2017, 8:30:47 UTC
Last modified: 24 Nov 2017, 8:32:12 UTC

Two Tasks are finished and waiting:

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=80281036

The server have more than 100k for waiting validation or deleting.
ID: 33127 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,395,668
RAC: 102,181
Message 33492 - Posted: 24 Dec 2017, 7:29:21 UTC

I've been crunching primarily LHC VM jobs in the past, so I am fairly new to Sixtrack.
Maybe someone could explain me in short how this works with the validation of finished and uploaded tasks:

I got validation (and credit points) for tasks which were uploaded 1 and/or 2 days ago, and I have plenty of tasks which were uploaded 3 or 4 days ago with "validation pending".
On what does the validation depend? How come that there are that different time spans?
ID: 33492 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 33493 - Posted: 24 Dec 2017, 8:15:35 UTC - in response to Message 33492.  

I got validation (and credit points) for tasks which were uploaded 1 and/or 2 days ago, and I have plenty of tasks which were uploaded 3 or 4 days ago with "validation pending".
On what does the validation depend? How come that there are that different time spans?

The validation needs a quorum of 2.
That means that 2 valid results from different clients (even different users) should have returned before the validator will process them both.
Why it takes that long?
Several reasons:
- Users have (over)filled their buffers, maybe cause SixTrack jobs are rare.
- Cause jobs are rare BOINC's Recent Estimated Credit is low for LHC and will request primarily LHC until REC is equal to other projects.
- The current jobs are running rather long depending on CPU-speed. AVG run-time 8.57 hours.
- Jobs not returned at all, errors and abandons should be resent, but resends seems to go to the end of the feeder queue.
ID: 33493 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,395,668
RAC: 102,181
Message 33494 - Posted: 24 Dec 2017, 9:22:03 UTC - in response to Message 33493.  

Many thanks, Crystal Pellet, for the explanations :-)
ID: 33494 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,141,325
RAC: 105,216
Message 36290 - Posted: 7 Aug 2018, 6:15:02 UTC

ID: 36290 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,954,549
RAC: 136,930
Message 36291 - Posted: 7 Aug 2018, 6:24:40 UTC - in response to Message 36290.  

Your wingcomputer has a high error rate.
If it's not a general problem your result will be confirmed by the next computer.
ID: 36291 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,029
RAC: 3,972
Message 36296 - Posted: 7 Aug 2018, 9:05:34 UTC - in response to Message 36291.  

Just checking mine I see many,many wingmen with errors or hundreds of tasks on old single and double cores with X86 memory

Before I even checked here I was looking at many of mine that will be on Validation pending for a while.

I think I will just finish my last 120 and switch back to the Theory and LHCb tasks.
ID: 36296 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 36460 - Posted: 16 Aug 2018, 14:13:38 UTC - in response to Message 36296.  

Hi Magic,
I think you got your credit recognised, right?
In your checks, have you noticed hosts regularly giving invalid results?
Thanks a lot in advance,
Cheers,
ID: 36460 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36466 - Posted: 16 Aug 2018, 14:40:59 UTC - in response to Message 36460.  

Yes, many hosts regularly giving many hundreds of invalid results each. And not just Sixtrack, ATLAS too.

Hosts that return nothing but invalid results are called cyclers. BOINC server has an option to limit cyclers to 1 task per day. Either that option is broken or it has been disabled for some reason, maybe accidentally.
ID: 36466 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,029
RAC: 3,972
Message 36471 - Posted: 16 Aug 2018, 19:57:31 UTC - in response to Message 36460.  
Last modified: 16 Aug 2018, 20:20:41 UTC

Hi Magic,
I think you got your credit recognised, right?
In your checks, have you noticed hosts regularly giving invalid results?
Thanks a lot in advance,
Cheers,


Yes I did eventually get those credits BUT I should have posted links to all those crooked hosts I found that day since now after a server update they seem to be all removed.

I will look for a few more minutes but I think all the evidence is gone right now since it has been about 10 days.

BUT I have to agree with bronco about this situation.

It used to be that after a host got errors all day the server would not allow new tasks for 24 hours and then the host could try again which was done hoping a user would check to see what the problem was.

But now they just keep getting tasks and the errors will never stop and I imagine some users do not check for problems like some of us always do.

Next time I find more evidence I will post that here for Sixtrack but for the VB problems like this I will post on the bronco thread.

(here is a quick example of computers that have hundreds of errors just because they have way too many tasks to even finish on time on both X86 and X64)

https://lhcathome.cern.ch/lhcathome/hosts_user.php?userid=570301

I see some are running really old Boinc versions but in this case it is mainly hosts getting more tasks than they can finish before the due time so they sit on these hosts until that due time happens and the server removes them but they keep getting more.
ID: 36471 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,141,325
RAC: 105,216
Message 38881 - Posted: 17 May 2019, 13:19:39 UTC - in response to Message 36290.  

ID: 38881 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,954,549
RAC: 136,930
Message 38882 - Posted: 17 May 2019, 13:45:49 UTC - in response to Message 38881.  

Your wingman's computer has a huge rate of inconclusives/errors.
Just wait until the 3rd result returns and confirms your result.
ID: 38882 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,141,325
RAC: 105,216
Message 39315 - Posted: 8 Jul 2019, 5:37:11 UTC - in response to Message 38881.  

ID: 39315 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 39316 - Posted: 8 Jul 2019, 9:15:11 UTC - in response to Message 39315.  

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117806542


It seems like your pc did not crunch correctly the task
I have run it locally on a Ubuntu18.04 machine, and it finished regularly
ID: 39316 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,141,325
RAC: 105,216
Message 39372 - Posted: 17 Jul 2019, 6:56:38 UTC

A sixtrack with no result for all two Computer:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116951294
ID: 39372 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 39399 - Posted: 20 Jul 2019, 7:24:57 UTC - in response to Message 39372.  

Hello, maeax,

thanks a lot for spotting this. At first glace I feared we ran into a corner case of a calculation not correctly coded, hence leading two different results on different platforms. Then, we checked re-running the WU, with the two exes - your result matches the linux one, as expected, whereas the windows one did not match the result from the other volunteer.
The windows and linux results match. Hence, we concluded that the other host most probably experienced a memory corruption not related to the code or the input files.

The wingman should confirm this.
Happy crunching!
A.
ID: 39399 · Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Sixtrack Application : Inconclusive, valid/invalid results


©2024 CERN