Message boards : Number crunching : WU not being resent to another user
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24905 - Posted: 28 Oct 2012, 15:27:25 UTC - in response to Message 24904.  

It dosen`t happen with other BOINC projects. Normally, once a WU is out of time, a new WU is automatically generated at once.

And the same here - the WU is generated.

But with a limited number of volunteers, and a lot of work to be done, it can take time for the task to reach the head of the queue and get sent out. The fewer volunteers there are, the longer it takes.
ID: 24905 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 24906 - Posted: 28 Oct 2012, 22:06:41 UTC

Now that the tasks ready to send as dropped near 0, i´m getting only resent ones.
ID: 24906 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24907 - Posted: 29 Oct 2012, 6:36:53 UTC

The 'problem' of this project is that it is not running a feeder which produces the tasks when the que reaches a low water mark and stop it when the high water mark is raeched.
In that case the resends would be somewhere trickled in in the que as they are produced and get crunched when it is their turn to get send out.

Another problem as I take it is the BOINC server code.
It should give the resends a higher priority to get send out earlier.
But it looks like it is not working proper here at LCH@home. For what ever reason.

Since the work gets submitted in bigger batches the resends stack up behind the normal work and are crunched only after all other jobs got done.

Over to Eric to clear up anything which I didn't explain the right way.
Christoph
ID: 24907 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24911 - Posted: 31 Oct 2012, 18:28:22 UTC - in response to Message 24907.  

This was discussed extensively and investigated months ago. It is indeed a 2 part problem as suggested in this thread: first the resends need to be marked as high priority (I believe there are actually 2 high priority attibutes one can set) and second the resends need to be inserted into the middle of the queue. Investigations of months ago indicated the high priority attributes are being set but the tasks themselves are being placed at the very end of the queue. The questions are: 1) is that really a problem (ie. does it really cause some genuine harm) and 2) is it worth spending the manpower it would need to fix it. I believe the answer to both questions is "no!" and my reasons follow.

1) Credits are worthless, 99% of crunchers know that and don't care if they have a big backlog of pending credits. Fix it if the admins/devs have nothing else to do.

2) There is a concern about "the tail" recently expressed by Eric. The tail has received a lot of bad press over the years and while I don't deny it exists I can't convince myself it is the unholy evil so many have made it out to be. I believe the tail effect causes harm at projects that generate tasks on the fly. Generating on the fly means that the results being crunched now are required to generate the tasks that will be crunched in the very near future. By very near future I mean within the next few hours. Obviously if results don't get verified more tasks cannot be generated and the train screeches to a halt. Tasks are not generated on the fly here so the tail doesn't have the dreaded effect it might have.

Also, when you think about it, a batch of tasks requires X CPU cycles to compute and it doesn't matter which order you crunch the tasks the whole batch will still require X CPU cycles. Therefore if you insert a resend into the middle of the queue it doesn't really affect the time required to complete the batch because the insertion into the middle delays the sending and completion of some other task in the queue. The only thing changed is which result gets verified first and that is irrelevant to production rate if not generating tasks on the fly. So in the end reducing pending credits is the only reason for inserting resends into the middle at a project that is not generating tasks on the fly.

Christoph has suggested resends are not being inserted into the middle due to some deficiency in the feeder (or something like that). The solution would then require more than just tweaking a setting or two, it would require revamping the feeder code itself. That might be very complicated. If all that is true and if I were the project admin/dev and was subject to the time constraints they seem to be subject to at Sixtrack, I would be treating the problem as low priority too because there is no real harm in a large pending credit number. Of yes, a few crunchers would pack up their toys and flounce off in a hissy fit but they are so few in number they would have negligible effect on production. Notice I said negligible effect not zero effect. Whatever the projects goals are they will likely be achieved well before any deadline so a handful less crunchers will have next to zero effect.
ID: 24911 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24912 - Posted: 31 Oct 2012, 20:52:47 UTC

This is a very interesting and useful discussion, I'll try and get
my head round it in the next few days. (I did propose a three
level priority scheme which seems to be partly implemented
in BOINC.) It is difficult indeed to satisfy everyone! I'll come
back to the tail. If one has 6 studies, and one runs all six in parallel
not one will be complete for 6 weeks say; if one runs them one after
the other then one is finished in 1 week, one in two weeks, etc.
Which is better? Eric.
ID: 24912 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24913 - Posted: 31 Oct 2012, 20:52:48 UTC

This is a very interesting and useful discussion, I'll try and get
my head round it in the next few days. (I did propose a three
level priority scheme which seems to be partly implemented
in BOINC.) It is difficult indeed to satisfy everyone! I'll come
back to the tail. If one has 6 studies, and one runs all six in parallel
not one will be complete for 6 weeks say; if one runs them one after
the other then one is finished in 1 week, one in two weeks, etc.
Which is better? Eric.
ID: 24913 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24914 - Posted: 1 Nov 2012, 5:04:06 UTC - in response to Message 24912.  
Last modified: 1 Nov 2012, 5:07:05 UTC

If one has 6 studies, and one runs all six in parallel
not one will be complete for 6 weeks say; if one runs them one after
the other then one is finished in 1 week, one in two weeks, etc.
Which is better? Eric.


Better for you? It depends. If your hands are tied until all 6 are complete then it doesn't matter whether you run them in parallel or sequentially. If the data produced by any given study allows you to make progress in some other urgent endeavor then maybe sequential is better. If you're being pulled in a thousand different directions on any given day then it doesn't matter because you never have nothing to do anyway and would still be swamped even if you could work 24/7/365 like a robot.

Better for us? Doesn't matter what's better for us. We're here to donate and do what's best for you and the science. That's all that matters.
ID: 24914 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24918 - Posted: 3 Nov 2012, 20:40:37 UTC - in response to Message 24914.  

If one has 6 studies, and one runs all six in parallel
not one will be complete for 6 weeks say; if one runs them one after
the other then one is finished in 1 week, one in two weeks, etc.
Which is better? Eric.


Better for you? It depends. If your hands are tied until all 6 are complete then it doesn't matter whether you run them in parallel or sequentially. If the data produced by any given study allows you to make progress in some other urgent endeavor then maybe sequential is better. If you're being pulled in a thousand different directions on any given day then it doesn't matter because you never have nothing to do anyway and would still be swamped even if you could work 24/7/365 like a robot.

Better for us? Doesn't matter what's better for us. We're here to donate and do what's best for you and the science. That's all that matters.


+1
Christoph
ID: 24918 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24919 - Posted: 3 Nov 2012, 21:10:47 UTC

In my previous post I gave an explaination of the situatuion the way I remember and understood it reading older posts made by project staff (not necessarily Eric).

It was my intention to help other people to understand the situation and keep patient.

It is nice to see that there is a discussion evolving out of this.

I personally don't care if resends are at the end of the que. If it is helpful for Eric to wait one day or two after one batch completed to give time for the resends to return and complete the study that is also fine with me.

As Jujube said already: we are here to cruch what the project gives us. We don't care if it is one study or three at a time. And I don't realy care about credits. I did also sign up at the BOINC Vbox wrapper test project which will never tell anybody about the earned credits.

About Jujube's point that we need to calculate X cycles to complete one study and that the resends in the middle will delay other work: It is up to Eric. If he can 'sit out' one day or two (Or what ever time frame needed) of crying for new work in the message boards and propably in his message box until also the resends are returned than leave everything as it is.

If he wants to keep the time without work available as short as possible / non existend then the continious feeding could be an option for him because the resends will be trickled in and the studies completed faster then if he keep feeding us batches and some old resends linger around maybe still after the 2nd or 3rd following studie.

That is also the reason why I personally think that the low water / high water feeder system is a good idea.

I hope I didn't forget anything what I planned to comment on. Been a bit long working day and my brain don't like to work any longer.

Looking forward to what will come out of this.
Christoph
ID: 24919 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24921 - Posted: 4 Nov 2012, 7:21:24 UTC - in response to Message 24919.  

That is also the reason why I personally think that the low water / high water feeder system is a good idea.


The low water / high water feeder system is definitely the best way. I overlooked the fact that if they start another batch before the tail runs out then further resends from the previous batch will go to the tail of the new batch not the tail of the batch they actually belong to. My bad, thanks for correcting me.
ID: 24921 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24935 - Posted: 5 Nov 2012, 19:45:21 UTC - in response to Message 24921.  

You are welcome. Good to see that my brain still produce good explanations when it is already nearly sleeping.

@Eric: You wrote in your status update that you are the feeder. Ok.
I just deleted my last five minutes work on this post because I recon I took a wrong approach.
Not the feeder is doing the amount of work being available to the users to download on other projects but the splitters.
At least at SETI and Einstein. But then they also do have raw data files which needs pre-processing and need to be split up into workunits to be send out.
That means we don't realy talk bout the feeder process. But the splitter which are not used here.

So then the question for Eric and crew is: Is there a way for you to serve the work to the (maybe) in BOINC Server existing standard splitter / work generator for that it serves the feeder with a limited amount of work where then the resends will be lined up as they appear?

Maybe you ask at the Collatz project for help if necessary. They have a 'work generator' called process and also don't need a splitter.

Hope this gets us a bit closer to reduce the problem of the tail.
ID: 24935 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24936 - Posted: 6 Nov 2012, 0:52:59 UTC - in response to Message 24935.  

You are welcome. Good to see that my brain still produce good explanations when it is already nearly sleeping.


Christoph is correct. Surely other projects have encountered exactly this same problem. If there aren't server-side options to handle it then maybe there should be? WCG has been running multiple simultaneous sub-projects and/or queues for years and so have other projects. I can't believe they would just live with this in fact I know they're not because their resends take a matter of minutes to get out the door, not days/weeks. Somebody out there knows how to fix this, it's just a matter of asking around or digging through email list archives.
ID: 24936 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 24937 - Posted: 6 Nov 2012, 4:40:35 UTC

Other project had experienced this same issue. It was a major issue when SETI@Home switched from a 3 client assignment to a 2 client assignment for a WU. WU that were completed with no error but failed to validate initially were pushed to the bottom of the assignment stack and it took a very long time for them to bubble up, run and then validate.

The SETI developers came up with something that took failed WU and pushed them into the waiting stack at or near the top. The result is that if validation fails then the WU gets pushed for assignment to another client.

Someone from the developers might take the time to contact the SETI@Home people to see how they managed this.

ID: 24937 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24938 - Posted: 6 Nov 2012, 8:56:13 UTC - in response to Message 24937.  

Other project had experienced this same issue. It was a major issue when SETI@Home switched from a 3 client assignment to a 2 client assignment for a WU. WU that were completed with no error but failed to validate initially were pushed to the bottom of the assignment stack and it took a very long time for them to bubble up, run and then validate.

The SETI developers came up with something that took failed WU and pushed them into the waiting stack at or near the top. The result is that if validation fails then the WU gets pushed for assignment to another client.

Someone from the developers might take the time to contact the SETI@Home people to see how they managed this.

I don't think that's actually the case. I work quite intensively on the SETI project, and I'm in touch with the developers there: I've never heard of such a thing.

It's just a very, very busy project. They maintain a 'high water mark' of around 300,000 tasks ready to send, and turn over about 60,000 tasks per hour - so the end of the queue is never more than five or six hours away.

They have a separate Beta server, with the same server code but a much lower turnover: I've seen resent tasks wait for literally months there, at times when no application is being worked on with active testing.
ID: 24938 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24942 - Posted: 6 Nov 2012, 22:09:28 UTC - in response to Message 24938.  

It's just a very, very busy project. They maintain a 'high water mark' of around 300,000 tasks ready to send, and turn over about 60,000 tasks per hour - so the end of the queue is never more than five or six hours away.


That sounds simple enough that it could be made to work here too. The trick is having the code to do it and setting the high/low water marks appropriately. For example, if Eric has a batch of 179,249 tasks, just to pick a nice not-round number, then he ought not to dump all 179,249 into the queue at once. There needs to be a high water mark of say 2,000 and a low water mark of say 1,000. The batch starts with the feeder, splitter, or whatever its name is, dumping 2,000 tasks into the queue. When there are only 1,000 tasks left in the queue the feeder/splitter/whatever dumps all the resends into the queue and follows those with enough tasks to fill the queue to the high water mark. Eventually all 179,249 tasks have been put into the queue with all the resends sprinkled in as well at the beginning of each top-up. There should be only a short tail which won't matter anyway because even if Eric creates another big batch the tasks in the tail will go into the queue first before tasks from the new batch.

Now the milllion dollar question is.... Is there server code available that does that? Is that code already installed? If so what are the names of the options/config items that make it happen?
ID: 24942 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24944 - Posted: 6 Nov 2012, 23:38:07 UTC - in response to Message 24942.  

It's just a very, very busy project. They maintain a 'high water mark' of around 300,000 tasks ready to send, and turn over about 60,000 tasks per hour - so the end of the queue is never more than five or six hours away.

That sounds simple enough that it could be made to work here too. The trick is having the code to do it and setting the high/low water marks appropriately. For example, if Eric has a batch of 179,249 tasks, just to pick a nice not-round number, then he ought not to dump all 179,249 into the queue at once. There needs to be a high water mark of say 2,000 and a low water mark of say 1,000. The batch starts with the feeder, splitter, or whatever its name is, dumping 2,000 tasks into the queue. When there are only 1,000 tasks left in the queue the feeder/splitter/whatever dumps all the resends into the queue and follows those with enough tasks to fill the queue to the high water mark. Eventually all 179,249 tasks have been put into the queue with all the resends sprinkled in as well at the beginning of each top-up. There should be only a short tail which won't matter anyway because even if Eric creates another big batch the tasks in the tail will go into the queue first before tasks from the new batch.

Now the milllion dollar question is.... Is there server code available that does that? Is that code already installed? If so what are the names of the options/config items that make it happen?

The keywords to look for are 'Work generator' (as used at Einstein) or 'workunit generator'. 'Splitter' is a specialised version of a WU generator used at SETI: 'feeder' is a different animal altogether, and doesn't belong in this list.
ID: 24944 · Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 30 Oct 11
Posts: 26
Credit: 4,940,164
RAC: 0
Message 24949 - Posted: 9 Nov 2012, 14:38:01 UTC - in response to Message 24944.  

It's just a very, very busy project. They maintain a 'high water mark' of around 300,000 tasks ready to send, and turn over about 60,000 tasks per hour - so the end of the queue is never more than five or six hours away.

That sounds simple enough that it could be made to work here too. The trick is having the code to do it and setting the high/low water marks appropriately. For example, if Eric has a batch of 179,249 tasks, just to pick a nice not-round number, then he ought not to dump all 179,249 into the queue at once. There needs to be a high water mark of say 2,000 and a low water mark of say 1,000. The batch starts with the feeder, splitter, or whatever its name is, dumping 2,000 tasks into the queue. When there are only 1,000 tasks left in the queue the feeder/splitter/whatever dumps all the resends into the queue and follows those with enough tasks to fill the queue to the high water mark. Eventually all 179,249 tasks have been put into the queue with all the resends sprinkled in as well at the beginning of each top-up. There should be only a short tail which won't matter anyway because even if Eric creates another big batch the tasks in the tail will go into the queue first before tasks from the new batch.

Now the milllion dollar question is.... Is there server code available that does that? Is that code already installed? If so what are the names of the options/config items that make it happen?

The keywords to look for are 'Work generator' (as used at Einstein) or 'workunit generator'. 'Splitter' is a specialised version of a WU generator used at SETI: 'feeder' is a different animal altogether, and doesn't belong in this list.


I think you are the Pro coder while the rest of us don't know how to use those terms the way you do, so we use terms that fit for us, but are not technically accurate for you, the Pro coder. I THINK what he is trying to say is why aren't the resends automatically sent immediately to the available workunit cache instead of just into 'some other cache' then into the available units cache?
ID: 24949 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 24979 - Posted: 29 Nov 2012, 17:57:40 UTC - in response to Message 24949.  

I THINK what he is trying to say is why aren't the resends automatically sent immediately to the available workunit cache instead of just into 'some other cache' then into the available units cache?


No, you misunderstand, that's not what I was trying to say. I would try to unravel it for you but I don't think I have that much time.
ID: 24979 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : WU not being resent to another user


©2024 CERN