Message boards : Sixtrack Application : many tasks still not validated after 13 days
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33629 - Posted: 2 Jan 2018, 16:09:24 UTC

From my tasks list on the webpage I can see quite a number of unvalidated tasks which were uploaded as far back as December 20 (at this date, the list ends - most probably there are even older tasks around, waiting vor validation).

What does this mean? Will I ever earn credit points for those, or are these old tasks lost at some time?
ID: 33629 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,874,996
RAC: 137,157
Message 33630 - Posted: 2 Jan 2018, 16:54:31 UTC - in response to Message 33629.  

Be patient.
It looks like there is nothing wrong with your results except that they need a confirmation by a wingman's computer (quorum of 2, requested by the project).
Your first wingman gave up (for whatever reason) so the second task of the WU has to be rescheduled.
Unfortunately those resends are sorted at the end of the currently very long RTS queue.

Cheers
ID: 33630 · Report as offensive     Reply Quote
JFK73

Send message
Joined: 10 Jan 12
Posts: 5
Credit: 1,047,462
RAC: 0
Message 33638 - Posted: 2 Jan 2018, 23:27:47 UTC - in response to Message 33630.  

Thanks for explaining !
ID: 33638 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33699 - Posted: 7 Jan 2018, 8:31:29 UTC - in response to Message 33630.  

Be patient. ...
Thanks for explaining. I just checked again, many of the still unvalidated tasks had been uploaded as far back as December 20, 2017 (i.e. 18 days ago). So we'll see what's going to happen.
I hope that finally all this work was not for nothing :-(
ID: 33699 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 33790 - Posted: 12 Jan 2018, 7:20:35 UTC - in response to Message 33699.  

Have you tried checking how many wingmen you have to wait for?
For example, yesterday I crunched the fourth or even fifth replica of the same task, eg https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82065656
That poor guy has been waiting since 18th Dec - this is unfortunately the result of a huge peak in WUs being submitted at ~ the same time, relatively long (4-5h on my pc) tasks, and storage issues (with people abandoning or results reported after the deadline...)
Hope this helps!
Cheers,
ID: 33790 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33796 - Posted: 12 Jan 2018, 11:52:03 UTC - in response to Message 33790.  

Have you tried checking how many wingmen you have to wait for?
Hm - honestly, I've no idea how I can make this check. Please let me know; many thanks.
ID: 33796 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,874,996
RAC: 137,157
Message 33797 - Posted: 12 Jan 2018, 12:30:37 UTC - in response to Message 33796.  

... I've no idea how I can make this check. Please let me know; many thanks.

From your main account page click on "Tasks view" to get your task list.
Then click on the ID in the workunit colum.
The resulting page shows the distribution of the entire workunit and additional information, e.g. how many results are necessary to get a workunit validated.
ID: 33797 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33798 - Posted: 12 Jan 2018, 12:36:34 UTC - in response to Message 33797.  

...From your main account page click on "Tasks view" to get your task list....
oh, many thanks, it's easy enough :-) Valuable information!
ID: 33798 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,874,996
RAC: 137,157
Message 33799 - Posted: 12 Jan 2018, 12:43:56 UTC - in response to Message 33798.  

... many thanks ...

Immer gerne.
:-)
ID: 33799 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33929 - Posted: 19 Jan 2018, 11:03:38 UTC

In my task list I just noticed a still unvalidated task (created on Dec. 23) which I uploaded on Jan. 13 - the details show that the two other crunchers had "Error while computing". Does this mean that this workunit will never get validated?
ID: 33929 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,874,996
RAC: 137,157
Message 33930 - Posted: 19 Jan 2018, 11:09:57 UTC - in response to Message 33929.  

In my task list I just noticed a still unvalidated task (created on Dec. 23) which I uploaded on Jan. 13 - the details show that the two other crunchers had "Error while computing". Does this mean that this workunit will never get validated?

Be so kind as to post a link to the WU or at least the WUID.
ID: 33930 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33931 - Posted: 19 Jan 2018, 11:19:13 UTC - in response to Message 33930.  

Be so kind as to post a link to the WU or at least the WUID.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82213313
(computer 10388905: it's me)

and yet another one, with a whole mix of remarks: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82054142
(computer 10452404: it's me)
ID: 33931 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,874,996
RAC: 137,157
Message 33932 - Posted: 19 Jan 2018, 11:45:38 UTC - in response to Message 33931.  

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82054142

This WU may be critical.

Until now your computer is the only one that delivered a valid result but that result has to be confirmed by a wingcomputer (min_quorum=2).
As 3 wingcomputers failed (for different reasons) the server created a 5th (and last!) result after 2018-01-16 22:35:39 UTC.
This result was added at the end of the RTS queue and has not yet been sent out.

As soon as result #5 will be reported it either confirms your result and both computers get the credit
OR result #5 fails and the whole WU will be treated as failed which means no credit to all computers.

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82213313

This WU is a bit less critical as there are only 4 results in the list.
If result #4 fails or does not confirm your result, a 5th one will be created.

At the end there's nothing you or the project admins can/will do.
Just wait and see what will happen.
ID: 33932 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33937 - Posted: 19 Jan 2018, 14:21:55 UTC - in response to Message 33932.  

... At the end there's nothing you or the project admins can/will do.
Just wait and see what will happen.
thanks once more for your good explanation. I am rather new to Sixtrack (so far, I had only crunched VM tasks).
Honestly, I am rather surprised about the complexity of these Sixtrack WUs. For me the question is whether it's necessary, for some reason, to make it that complicated. Particularly making a positive validation of the result dependent on what several other crunchers are doing or are not doing. Which may result in the fact that crunching a given task finally was for nothing. I really can't see the rationale behind this kind of procedure.

On one of my PCs I am still waiting for AVX tasks - so far I only got SSE2 tasks (I have read your recent explanation regarding this topic).
Should this not work out either pretty soon, I guess I will abandon Sixtrack and resume crunching VM tasks.
ID: 33937 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33939 - Posted: 19 Jan 2018, 17:04:13 UTC - in response to Message 33937.  
Last modified: 19 Jan 2018, 17:05:32 UTC

Particularly making a positive validation of the result dependent on what several other crunchers are doing or are not doing.


BOINC will take care of it, don't worry. Just step away from the computer.
It is estimated that in a worst case scenario, results can take up to four weeks to be validated. The team knows this. Obviously with the recent server issues, we are experiencing a worst case scenario. The longer it takes to reach the quorum, the more likely a "reliable" system will be given the task with priority. The chances of validation will improve as time goes on.

Validating the results among several volunteers is very much necessary. I can see the downside, but I also don't want a random flipped bit to cause major issues.

On one of my PCs I am still waiting for AVX tasks - so far I only got SSE2 tasks


I do get AVX tasks and see no improvement in run time so far (although it is hard to tell, given the closed nature of the tasks). They're also still ~180 GFLOPS.

Should this not work out either pretty soon, I guess I will abandon Sixtrack and resume crunching VM tasks.


Good luck successfully returning an ATLAS task at the moment. It is nearly impossible given the current server situation. I have crunched four ATLAS tasks last weekend. Of those four, two are stuck with the common PID=-1 message.
I'll stick with 50KB SixTrack instead of 140MB ATLAS until the servers have been upgraded.
ID: 33939 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33940 - Posted: 19 Jan 2018, 17:14:51 UTC - in response to Message 33939.  

Good luck successfully returning an ATLAS task at the moment. It is nearly impossible given the current server situation. I have crunched four ATLAS tasks last weekend. Of those four, two are stuck with the common PID=-1 message.
I'll stick with 50KB SixTrack instead of 140MB ATLAS until the servers have been upgraded.
I agree, uploading ATLAS tasks can take several days (that's why I am wondering that still such a high number of tasks is being pumped into the mills, while it's clear that the infrastructure problems back at LHC are still prevailing).
However, so far, I have experienced no such problems with CMS, LHCb and Therory. So these subprojects could be recommended for the time being, until the problems at LHC will finally be solved (hopefully).
ID: 33940 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33941 - Posted: 19 Jan 2018, 19:08:55 UTC

I just found out the following interesting thing when looking up a finished task in my list for which validation is still pending:

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=83937128

what catches my eye is that my computer (10388905) got it as SSE2, the wingman's computer got it as AVX.
Can anyone explain to me how come?
ID: 33941 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,147,992
RAC: 15,989
Message 33945 - Posted: 19 Jan 2018, 21:24:28 UTC - in response to Message 33941.  
Last modified: 19 Jan 2018, 21:25:49 UTC

I just found out the following interesting thing when looking up a finished task in my list for which validation is still pending:

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=83937128

what catches my eye is that my computer (10388905) got it as SSE2, the wingman's computer got it as AVX.
Can anyone explain to me how come?

The tasks are free from the application optimization, they are just data files. But the applications can be different [sse2/pni/avx or windows/linux/mac] etc.
ID: 33945 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,335,921
RAC: 102,416
Message 33946 - Posted: 20 Jan 2018, 6:18:35 UTC - in response to Message 33945.  

The tasks are free from the application optimization, they are just data files. But the applications can be different [sse2/pni/avx or windows/linux/mac] etc.
okay, but then the question is: why is the task being run as AVX on the other cruncher's computer, and as SSE2 on mine, although mine also offers AVX?
ID: 33946 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,147,992
RAC: 15,989
Message 33948 - Posted: 20 Jan 2018, 6:31:29 UTC - in response to Message 33946.  

Sorry, I don't have an answer to that. Have you checked the messages when Boinc starts up that it actually recoqnizes the avx extension on the processor?
ID: 33948 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Sixtrack Application : many tasks still not validated after 13 days


©2024 CERN