Message boards : Number crunching : I think we should restrict work units
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 11 · Next

AuthorMessage
[B^S] MattDavis

Send message
Joined: 2 Oct 04
Posts: 9
Credit: 36,319
RAC: 0
Message 13376 - Posted: 15 Apr 2006, 15:12:31 UTC
Last modified: 15 Apr 2006, 15:17:15 UTC

I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch.

I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process.

That doesn't help the project - that's greed by people who want the most LHC units.

When the number of available units hits zero, the scientists shouldn't have to wait more than a day or two. I suggest that the project limit the number of work units per computer to 2-3 at any given time. That way, as soon as all the work is sent out LHC will get them all back very soon after. Once a work unit is sent back, that computer can have another.

This will speed up work-unit generation for all of us (my cache is set very low and every work unit I get is sent back within 12 hours, since I have other projects running too) since LHC scientists will get their work back faster and thus be able to create the next batch sooner.
-----
ID: 13376 · Report as offensive     Reply Quote
senatoralex85

Send message
Joined: 17 Sep 05
Posts: 60
Credit: 4,221
RAC: 0
Message 13379 - Posted: 15 Apr 2006, 17:04:04 UTC

I do not think the problem lies with the user caching the workunits. This can be easily solved by the project adjusting the deadlines as appropriate. Maybe someone can help me out here, but I do recall the fact that all the crunching work needed here is near completion. I would refer you to the Cafe boards where I beleive Ben Segal has mentioned several different applications that may be added to this website to create more work.

Anyways, not always having work can be a good thing. It will allow us crunchers to contribute to other DC projects.........Although this project is my favorite, I do also crunch for Ufluids. Eventually, I think I will join QMC. I think I could contribute to SCIENCE the most with those projects! (that is only my opinion, please do not take that as criticism)
ID: 13379 · Report as offensive     Reply Quote
YeshuaAgapao

Send message
Joined: 29 Nov 05
Posts: 9
Credit: 1,266,935
RAC: 0
Message 13380 - Posted: 15 Apr 2006, 18:05:14 UTC
Last modified: 15 Apr 2006, 18:14:15 UTC

My cache is set to MAX on all 3 machines for LHC because it usually don't have work and it takes less than 2 days for it to run out and 1 machine is usually not connected (alienware laptop), my old dell is on crappy internet (semibraodband cellphone modem competing with file sharing), and the other one is at work so its got a good connection. LHC is at 15% for me but its struggling to reach 10% of my total credit because it don't always have work. Oh yeah one trick to force your boinc to dowload the most possible LHC workunits is to suspend all other projects, update on LHC, then resume all the other projects. Sometimes on a one or two day window when LHC has work, boinc dont want to download any. THe suspend-download-resume trick works really well and LHC's short deadlines means they get crunched first, and I don't care if I can't complete all the QMC work on time.

On all other projects except the CPDN ones only the alienware laptop is set to MAX. I also shut down (suspended) QMC because they lie to BOINC about crunch times, making your boinc crunch nearly 2 weeks straight on NDF mode even on a non-maxCache machine (their workunits take 40 hours not 15 or 20). I might turn QMC back on in a few weeks but if they send 8 40-hour work units again i probably will abort most of them and only crunch 2 or 3 of them.

Ufuids is having problems and is only serving a masters thesis. Their database crash made me lose out on credit for 12 WUs - 5+2+5 for each machine - and they take 15 hours each to crunch.

My... LinkSite | Blog | Pictures
ID: 13380 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 2 Sep 04
Posts: 165
Credit: 146,925
RAC: 0
Message 13382 - Posted: 15 Apr 2006, 21:10:10 UTC - in response to Message 13380.  

My cache is set to MAX on all 3 machines for LHC because it usually don't have work and it takes less than 2 days for it to run out and 1 machine is usually not connected (alienware laptop), my old dell is on crappy internet (semibraodband cellphone modem competing with file sharing), and the other one is at work so its got a good connection. LHC is at 15% for me but its struggling to reach 10% of my total credit because it don't always have work. Oh yeah one trick to force your boinc to dowload the most possible LHC workunits is to suspend all other projects, update on LHC, then resume all the other projects. Sometimes on a one or two day window when LHC has work, boinc dont want to download any. THe suspend-download-resume trick works really well and LHC's short deadlines means they get crunched first, and I don't care if I can't complete all the QMC work on time.

On all other projects except the CPDN ones only the alienware laptop is set to MAX. I also shut down (suspended) QMC because they lie to BOINC about crunch times, making your boinc crunch nearly 2 weeks straight on NDF mode even on a non-maxCache machine (their workunits take 40 hours not 15 or 20). I might turn QMC back on in a few weeks but if they send 8 40-hour work units again i probably will abort most of them and only crunch 2 or 3 of them.

Ufuids is having problems and is only serving a masters thesis. Their database crash made me lose out on credit for 12 WUs - 5+2+5 for each machine - and they take 15 hours each to crunch.


You only get one cache setting per venue. You do not get a separate cache setting per project - The General Settings are for all projects.

With the more recent versions of BOINC the problem with QMC would also resolve itself after the first batch of work.


BOINC WIKI
ID: 13382 · Report as offensive     Reply Quote
Profile Keck_Komputers

Send message
Joined: 1 Sep 04
Posts: 275
Credit: 2,652,452
RAC: 0
Message 13384 - Posted: 16 Apr 2006, 10:01:55 UTC

I think the best way to crunch for this project is to keep a small queue and a high resource share. That when there is work available your hosts will switch over and run mostly LHC work until the well dries up again. The project can set a standard defferal so they are not swamped with requests in between. When the project is responding the automatic defferalls should not go over 4 hours. It is only when the project does not respond at all that the automatic defferalls get huge.

A large queue reduces turnaround. Although there will always be lost tasks that increase the delay at least they are unintentional.
BOINC WIKI

BOINCing since 2002/12/8
ID: 13384 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 13385 - Posted: 16 Apr 2006, 12:38:49 UTC - in response to Message 13376.  
Last modified: 16 Apr 2006, 12:43:04 UTC

I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch.

I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process.

Sorry, this is said often and has been answered often.

It is just plain untrue, in my opinion.

This project, more than any other, is very familiar with the use of the deadline setting. In the past I have known LHC give out WU as long as 15 days and as short as 3 days, sometimes at the same time.

If the scientists needed quick-round they'd set a 3 day deadline, or even shorter. If they are not particularly bothered theyd set a 15 day deadline or even longer. If they know they will come back and analyse the results in about three weeks they set a 7 day deadline, as they did here.

Please drop the fanasy that there are teams of CERN scientists poised biting their pencils waiting for the work to come back. That is not how they operate (or not usually - when it is they will set a very short deadline). In fact they have *other* *work* to do, and whether the WU are sat in a cache or sat back on the server will not affect the timing of their analysis.

I have said this several times before, and asked to be corrected by a member of the project team if I am wrong. Until someone working for LHC says I am wrong, I will continue to assume the above analysis is correct.

There is a separate argument.

You think it is unfair for some people to use their cache to grab (say) 7 days work when those who download as they go get only (say) 3 days work from a batch. That is a reasonable argument, on which I'd be neutral - neither for nor against. If a reduction in deadlines was applied for that reason, I'd accept that cheerfully as a response to user demand.

Even better, in my opinion, would be a change to the scheduler code so that when a WU is re-issued (after an error, or after user abort, or after timeout) it could have a shorter deadline.

This would mean that where a result needs to be re-sent, it would be prioritised by the receiving client, and refused ("won't finish in time") be a client with a large queue. It would especially benefit projects like Einstein where exactly the quorum is sent out initially, as these suffer most when a result is trashed.

But while the project makes the rules the way they do, and unless the project explicitly asks people to keep cache sizes down, then I don't see anything wrong in playing those rules to my best advantage. I assume most other cache-crammers feel the same.

River~~


ID: 13385 · Report as offensive     Reply Quote
[B^S] MattDavis

Send message
Joined: 2 Oct 04
Posts: 9
Credit: 36,319
RAC: 0
Message 13388 - Posted: 16 Apr 2006, 17:42:08 UTC

I like that logic.

Until someone from LHC tells me I'm wrong, I'm going assume the moon is made of cheese and that cows can fly.

Duck here comes a cow!
-----
ID: 13388 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 13389 - Posted: 16 Apr 2006, 19:02:52 UTC - in response to Message 13388.  

Duck here comes a cow!


With the latest bird flu scares, maybe that should read

Cower, here comes a duck.

;-)

But yes, I do think that the LHC team have a better idea of what they want than some random participant on a mission against other people's crunching strategies.

Why would they be the ones to ask about the lunar dairy mining methodology? Or about bovine aviation accords?
ID: 13389 · Report as offensive     Reply Quote
[B^S] MattDavis

Send message
Joined: 2 Oct 04
Posts: 9
Credit: 36,319
RAC: 0
Message 13390 - Posted: 17 Apr 2006, 1:48:49 UTC
Last modified: 17 Apr 2006, 1:49:15 UTC

If an LHC staff member told me I was wrong, then I'd listen.

I was making fun of the fact that you basically said "I'm right until someone from LHC says I'm wrong." I was using that rationale to say "cows could fly until someone from LHC likewise says I was wrong."

Meaning, the lack of an LHC member telling you that you're wrong doesn't necessarily mean that you're right - it could also mean LHC scientists don't read the board and haven't even seen your post.
-----
ID: 13390 · Report as offensive     Reply Quote
KWSN - A Shrubbery
Avatar

Send message
Joined: 3 Jan 06
Posts: 14
Credit: 32,201
RAC: 0
Message 13392 - Posted: 17 Apr 2006, 2:56:53 UTC

I'd be inclined to believe River is on the right track. The project scientists are well aware of how to manipulate the deadlines to get the work they need.

Now, on a semi-related note: I see several work units timing out after a quorum has been met. Will those units be re-sent or is the quorum already good enough?
ID: 13392 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 13394 - Posted: 17 Apr 2006, 5:19:03 UTC - in response to Message 13392.  


Those of us who were here at the time (middle of 2005) will recall the introduction of Chrulle's deadline optimiser, that looks at the base of available computers and their varying turnround times and adjusts the deadline on the fly, with the object of minimizing the processing time for the study.

LHC might do well to reduce the initial replication. Since a quorum is reached at three results, around 40% of the work we do is redundant.



Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 13394 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 13398 - Posted: 17 Apr 2006, 9:41:02 UTC - in response to Message 13390.  

Meaning, the lack of an LHC member telling you that you're wrong doesn't necessarily mean that you're right - it could also mean LHC scientists don't read the board and haven't even seen your post.


If it was the first time the point had been made, yes.

But, as I said at the start of that post, replies from a similar angle have been made repeatedly over the last 18 months - since before I joined this project in fact, if you go back and read the boards far enough.

And secondly, if you go back and read my post again you will see I also gave other supporting arguments for my view. There is no supporting argument to your examples.

Clearly you care about the project - and so do I. Though my main motive is co-operative, I also enjoy the competitive edge that the stats give. The combination of co-operation and competition seems a powerful motivator, and yes it motivates me towards both supporting the project and also towards competing for work when it is available.

When I am in competitive mode I will compete fairly according to the rules set out by the organisers. For me that includes automated rules like the deadlines, and it would also include abiding by informal requests from the project like "please don't use a cache bigger than X for this project". For me that does not include abiding by requests that come from elsewhere unless/until those requests are endorsed by the project.

River~~
ID: 13398 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 13399 - Posted: 17 Apr 2006, 10:43:11 UTC - in response to Message 13394.  


Those of us who were here at the time (middle of 2005) will recall the introduction of Chrulle's deadline optimiser, that looks at the base of available computers and their varying turnround times and adjusts the deadline on the fly, with the object of minimizing the processing time for the study.


Do we know that that is still in use? I assume it is

My idea was to use differential deadlining *within* a WU - later results in the same WU would have shorter deadlines than the earlier ones - in contrast to Chrulle's which basically set a deadlines that vary between WU, but remain the same length for each result within a given WU.

The project team would set two deadline lengths, and the database would need to hold two deadline values where at present it holds just one.

The first result issued for each WU is given the longer deadline, and so do all the results till we get to the initial replication figure.

Results issued later (after timeouts, errors, etc) would be given the shorter deadline reflecting the greater urgency caused by the delay.

The code that grows new results knows how many results have been made already (it must in order to enforce the maximums), and I imagine it could easily select one deadline length or the other according to whether this is less or more than the initial replication figure. Harder (I imagine) would be actually making that extra column in the database and the extra field in the admin interface so that the admins could fill it in.

River~~

ID: 13399 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 13408 - Posted: 19 Apr 2006, 11:03:24 UTC

Why set deadlines at 7 days when you want them back in 4? If the project allows 7 days then as long as the results are returned before their deadlines then there is no argument. MalariaControl.net has their deadlines set at about 2.5 days since they want their results back so that they can develop new wu's based on the returned results. It's easy really.

Live long and crunch.

Paul
(S@H1 8888)
BOINC/SAH BETA
ID: 13408 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 13409 - Posted: 19 Apr 2006, 12:27:19 UTC

My idea was to use differential deadlining *within* a WU - later results in the same WU would have shorter deadlines than the earlier ones - in contrast to Chrulle's which basically set a deadlines that vary between WU, but remain the same length for each result within a given WU.


All the time the replication figure is higher than the quorum this isn't needed. Since quorum can be reached before all results are returned shortening the deadline on later results will affect completion on only a very few units.

For example, I had around 20 results in my cache when the last batch ran out of work. I checked the results table: around 75% of the results I had yet to crunch related to workunits that had already reached quorum. Deadlines notwithstanding, in the last 36 hours of work only 25% would actually contribute anything. As a result I have now shortened my cache from 3 days to .5 day, so that I get results back quickly. This way most of my results now go back in just a few hours rather then seven days.




Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 13409 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 13419 - Posted: 20 Apr 2006, 11:31:52 UTC - in response to Message 13392.  

Now, on a semi-related note: I see several work units timing out after a quorum has been met. Will those units be re-sent or is the quorum already good enough?


They wouldn't have to be, so I wouldn't expect them to be. As is, the 4th unit is a work around for people who tras units, so a quorum could still be reached, and the unit could still be validated. In this way and to my understanding, if the 4th unit comes in it gets credit, but if it doesn't, that's it.

Unless, among the 3, a quorum can't be reached, because the results are too different and there is no ability for the validator to determine which return was "correct". In that case, another would be sent until a quorum of results could be reached.

As to the queue size, if there was a problem for the project scientists, I'm rather certain that either they would adjust deadlines as has been done from some accounts here, or they, or a moderator would come in mention the prob, and point blank ask people. This might even be announced on the front page. RALPH asks people not to have big queues, which is their need. They also state why "it's a testing environment and we want to test over a variety of different computer configs".

It's also like, when akosv's optimized app started showing up on einstein@home... There was no official word, and some assumed that perhaps Bruce Allen couldn't speak, because perhaps parts of the app were copyrighted or something, for which the code couldn't be open sourced...

Anyhow, in the end, not only was a beta started based on akosv's optimizations, but when akosv had made recommendations on how they could improve the beta, Bruce Allen doesn't ignore him... But in any case, it's sorta like over there. One person ended up saying back in the earlier days of this and before the announced beta "well it's not officially endorsed, but the project hasn't spoke against it either, and it is validating. Now, don't you think that if Bruce Allen came in here and asked people to stop using it, that each person here wouldn't delete it from their computers, post haste?"

If the project has a problem, be it with deadlines and the like, they wouldn't necessarily sit their gritting their teeth in abject silence. They'd more then likely express their concerns and give their request to the volunteers.
ID: 13419 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 13423 - Posted: 21 Apr 2006, 5:49:18 UTC - in response to Message 13399.  



Do we know that that is still in use? I assume it is


Greetings from the States.

I believe it is still in use. I do not think they have changed anything since i left.


My idea was to use differential deadlining *within* a WU - later results in the same WU would have shorter deadlines than the earlier ones - in contrast to Chrulle's which basically set a deadlines that vary between WU, but remain the same length for each result within a given WU.


That might be a good idea. Another good idea would be to give such result a higher priority. So that we are sure they are given out as fast as possible, instead of waiting in the queue.

cheers,
Chrulle
Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 13423 · Report as offensive     Reply Quote
senatoralex85

Send message
Joined: 17 Sep 05
Posts: 60
Credit: 4,221
RAC: 0
Message 13429 - Posted: 23 Apr 2006, 4:44:58 UTC - in response to Message 13423.  

Thanks for your input Chrulle. Nice to see you visit us once in awhile!
ID: 13429 · Report as offensive     Reply Quote
YeshuaAgapao

Send message
Joined: 29 Nov 05
Posts: 9
Credit: 1,266,935
RAC: 0
Message 13724 - Posted: 24 May 2006, 19:55:39 UTC
Last modified: 24 May 2006, 19:59:42 UTC

It is important to MaxCache LHC becuase it takes only 18-30 hours for them to run out of work once they put it up (looks like they put up 80000-150000 WUs at a time). I seem to be getting an edge by suspsending all projects but LHC not just to get work (until the downloads start), but for the entire 18-30 'work window'. Then you turn all the other projects back on and let BOINC take its time. So you crunch LHC exclusively while theres more work to serve, but after that it doesn't matter.

So you crunch just LHC for the 18-30 hours and then when it goes 'out of work', you still have the maximum possible cache as if LHC was the only project registered in BOINC and you turn all the other projects back on (LHC-exclusive is pointless when theres no more work left to serve).

My... LinkSite | Blog | Pictures
ID: 13724 · Report as offensive     Reply Quote
Dronak
Avatar

Send message
Joined: 19 May 06
Posts: 20
Credit: 297,111
RAC: 0
Message 13725 - Posted: 24 May 2006, 22:07:46 UTC - in response to Message 13376.  

I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process.

That doesn't help the project - that's greed by people who want the most LHC units.


I'm just getting back into these sorts of projects after a break when my (old) computer really couldn't handle it. I'm new to this project, and I must say that I was definitely surprised to find people with lots of work units queued up in progress (I think some had 100-200!) while I was only able to get 3 units. And I actually changed my network connection time near the finish of my first unit to make sure I could get another one or two more before they ran out. I was afraid I wouldn't get any more if I waited, and apparently rightfully so. It is disappointing to have an empty queue and be ready to do work, but not be able to get it because other people are sitting on it. People like me who could do work and help get it done faster aren't able to.

I don't know the exact problem/cause or what a good solution is. Maybe deadlines are part of it. Maybe the WU/day limit is part of it (there is one, right?). Maybe etiquette is part of it. Perhaps users can help solve the problem, maybe the people behind the project can do something to help even things out. *shrug* I'll keep an eye on the boards to see what's going on. I hope something can be done. I want to do work for this project when it's here, but feel a bit cheated seeing others sit on a lot of work while my computer goes unused. It feels like I'm being prevented from doing my fair share of the work.

A kind of side, related question: what happens to units that miss the deadline if there is no quorum?
ID: 13725 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 11 · Next

Message boards : Number crunching : I think we should restrict work units


©2024 CERN