Message boards : Number crunching : WU not being resent to another user
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
candido

Send message
Joined: 6 Dec 10
Posts: 9
Credit: 452,259
RAC: 0
Message 24789 - Posted: 6 Sep 2012, 23:59:15 UTC

I have a dozen WU in which the wingman's task errored.
These WU are not being resent. Some of them since the 26 august.
Could someone have a look at it?
thanks
ID: 24789 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24790 - Posted: 7 Sep 2012, 1:00:18 UTC

It is according to how the project works. The project manager recently made a huge batch of work (like 500k WU's). When a WU fails and gets scheduled for resending, it goes to the back of the queue. It has taken a lot of days to work through the huge amount of work that was posted, but you should see them being resend within the next 2-3 days (we are down to around 80k WU's when I write this).

So don't worry, it will solve itself soon.
ID: 24790 · Report as offensive     Reply Quote
candido

Send message
Joined: 6 Dec 10
Posts: 9
Credit: 452,259
RAC: 0
Message 24791 - Posted: 7 Sep 2012, 1:26:26 UTC - in response to Message 24790.  

Many thanks Uffe F.
ccandido

ID: 24791 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 182
Credit: 3,295,818
RAC: 0
Message 24793 - Posted: 7 Sep 2012, 7:09:21 UTC

I think we're through the big batch now - I'm starting to see resends arriving on all my machines.
ID: 24793 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24794 - Posted: 7 Sep 2012, 8:51:02 UTC

Right; thanks for all that. I am going to submit another big
batch but I presume they will all go the end of the queue.
I will also need to rerun a few incomplete tasks to finish some studies
but I can wait for everything to be finished as I need to compare
all the studies. I hope I have got this right. If I wait till the queue
empties there would be no new work for a few days, I think I
would rather take advantage of the excellent progress and keep
everybody busy for another week or two. Eric.
ID: 24794 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24795 - Posted: 7 Sep 2012, 9:03:08 UTC

Sounds great Eric.

And yes, I started receiving resends now also, so they should be on the way.
ID: 24795 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 182
Credit: 3,295,818
RAC: 0
Message 24796 - Posted: 7 Sep 2012, 10:22:53 UTC

My guess is that all the ~65K or so tasks ready to send on the server now are resends for one reason or another. That seems high, but possibly with so many WUs available recently, some hosts or users bit off more than they could chew and let the surplus time out.

If you start a new batch now, my understanding is that they'll go to the end of the queue, and not start appearing on our machines until the resends have all been sent out.

The potential problem is that inevitably some of the resends being sent out now will fail, and need to be resent yet again - and if you submit the batch first, they'll be backed up behind that. From the user point of view, it would be good to keep submitting smaller batches, so that the level of interest and engagement is kept high - people start drifting away if there's no work available. But I don't know how difficult or time-consuming it would be to keep drip-feeding us.
ID: 24796 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24797 - Posted: 7 Sep 2012, 15:38:29 UTC - in response to Message 24796.  

OK; I've put a relatively small batch and I shall watch
over the weekend. Eric.
(I have plenty more work for the intensity scan.
I am holding off "very" long jobs until I implement
a scheme to run them as a sequential series of shorter WUs.)
ID: 24797 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24798 - Posted: 8 Sep 2012, 5:08:03 UTC - in response to Message 24797.  

I just got a few wlxu2 on my computer, so it seems that there are still some fresh long ones in the queue.
ID: 24798 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24799 - Posted: 8 Sep 2012, 13:26:07 UTC

Well wlxu2 are "only" 10**6 turns maximum which I
now consider "long". The very long are in wlxu7 and as
I said I am holding off. There are very few resends
as I have plenty of work and I don't want to waste
resources by re-running until essential.
ID: 24799 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24800 - Posted: 8 Sep 2012, 14:33:13 UTC - in response to Message 24799.  

I have one of the wlxu7. Nearly 100 hours estimation. Not a re-send.
Christoph
ID: 24800 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24801 - Posted: 8 Sep 2012, 14:54:46 UTC - in response to Message 24799.  
Last modified: 8 Sep 2012, 15:00:07 UTC

I got 6 wlxu2, 2 of them is 7 hours, 2 of them is 61 hours and 2 of them is 76 hours.

It's not a problem, because I can easely do them in time, was just so you know that there are some out there. They are not resends.

3121879 short
3123215 long
3123328 long
3123629 long
3125320 short
3125798 long

I don't know where to look for turns, but i can se the estimated computation sizes:
180000 GFLOPs
1800000 GFLOPs

The reason there is 3 different estimated times are because of sse2, sse3 and pni jobs dont have the same estimation.
ID: 24801 · Report as offensive     Reply Quote
Zonar

Send message
Joined: 14 Jul 12
Posts: 2
Credit: 173,287
RAC: 12
Message 24802 - Posted: 8 Sep 2012, 18:41:10 UTC - in response to Message 24801.  

You can extract that from the name of the workunit. I've taken the workunits you have posted:

3121879 - wlxu2_nuebb1__41__s__64.31_59.32__4.5_5__6__49.5_1_sixvf_boinc52013
3125320 - wlxu2_nuebb1__48__s__64.31_59.32__4_4.5__6__1.5_1_sixvf_boinc53574

3125798 - wlxu2_nuebb0__46__s__64.31_59.32__5.5_6__7__37.5_1_sixvf_boinc53303
3123629 - wlxu2_nuebb0__43__s__64.31_59.32__4.5_5__7__9_1_sixvf_boinc52458[/b]
3123328 - wlxu2_nuebb0__41__s__64.31_59.32__5.5_6__7__88.5_1_sixvf_boinc52157
3123215 - wlxu2_nuebb0__41__s__64.31_59.32__5_5.5__7__7.5_1_sixvf_boinc52044

The normal workunits have a 6 in bold, the longer ones have a 7.
ID: 24802 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 182
Credit: 3,295,818
RAC: 0
Message 24822 - Posted: 13 Sep 2012, 6:47:03 UTC - in response to Message 24797.  

OK; I've put a relatively small batch and I shall watch
over the weekend. Eric.
(I have plenty more work for the intensity scan.)

Looks like it might be time for the next small batch, please.

Zero tasks ready to send on the server status page.
ID: 24822 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24825 - Posted: 13 Sep 2012, 9:24:06 UTC - in response to Message 24822.  

On their way. Eric.
ID: 24825 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,539,386
RAC: 365
Message 24828 - Posted: 13 Sep 2012, 17:42:24 UTC - in response to Message 24825.  

I've seen a lot of very short WUs in this batch. They all validated fine.

Was it expected ?
ID: 24828 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24854 - Posted: 19 Sep 2012, 11:33:49 UTC - in response to Message 24828.  

Can't quite say "expected"; not surprising, but this is
what we are investigating, the onset of chaotic motion.
ID: 24854 · Report as offensive     Reply Quote
Profile UBT - Rick Horn
Avatar

Send message
Joined: 4 Nov 06
Posts: 12
Credit: 376,810
RAC: 0
Message 24902 - Posted: 27 Oct 2012, 15:04:59 UTC

This is getting silly.PVs are stacking up, over three week old WUs are still awaiting new wingmen to complete.
Time to switch to another project.
ID: 24902 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,240
RAC: 49
Message 24903 - Posted: 28 Oct 2012, 10:49:49 UTC - in response to Message 24902.  

Well I am sorry about that; each task is sent only twice,
the minimum necessary for comparison. BOINC is supposed
to use deadlines to send an additional copy, but wea have
other problems with deadlines. It can easily happenn that
getting a second results could take some time as some
machines are slow, ar switched off, or whatever. This means
we too have a problem with a "tail" of incomplete cases
for a particular study. Re-submitting too often is a
waste of yout valuable resources. I'll discuss with
colleagues but I think this is how BOINC works. Eric.
ID: 24903 · Report as offensive     Reply Quote
Profile UBT - Rick Horn
Avatar

Send message
Joined: 4 Nov 06
Posts: 12
Credit: 376,810
RAC: 0
Message 24904 - Posted: 28 Oct 2012, 13:34:56 UTC

It dosen`t happen with other BOINC projects. Normally, once a WU is out of time, a new WU is automatically generated at once.
ID: 24904 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : WU not being resent to another user


©2019 CERN