Message boards :
Number crunching :
I think we should restrict work units
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Oct 04 Posts: 9 Credit: 36,319 RAC: 0 |
I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch. I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process. That doesn't help the project - that's greed by people who want the most LHC units. When the number of available units hits zero, the scientists shouldn't have to wait more than a day or two. I suggest that the project limit the number of work units per computer to 2-3 at any given time. That way, as soon as all the work is sent out LHC will get them all back very soon after. Once a work unit is sent back, that computer can have another. This will speed up work-unit generation for all of us (my cache is set very low and every work unit I get is sent back within 12 hours, since I have other projects running too) since LHC scientists will get their work back faster and thus be able to create the next batch sooner. ----- |
Send message Joined: 17 Sep 05 Posts: 60 Credit: 4,221 RAC: 0 |
I do not think the problem lies with the user caching the workunits. This can be easily solved by the project adjusting the deadlines as appropriate. Maybe someone can help me out here, but I do recall the fact that all the crunching work needed here is near completion. I would refer you to the Cafe boards where I beleive Ben Segal has mentioned several different applications that may be added to this website to create more work. Anyways, not always having work can be a good thing. It will allow us crunchers to contribute to other DC projects.........Although this project is my favorite, I do also crunch for Ufluids. Eventually, I think I will join QMC. I think I could contribute to SCIENCE the most with those projects! (that is only my opinion, please do not take that as criticism) |
Send message Joined: 29 Nov 05 Posts: 9 Credit: 1,266,935 RAC: 0 |
My cache is set to MAX on all 3 machines for LHC because it usually don't have work and it takes less than 2 days for it to run out and 1 machine is usually not connected (alienware laptop), my old dell is on crappy internet (semibraodband cellphone modem competing with file sharing), and the other one is at work so its got a good connection. LHC is at 15% for me but its struggling to reach 10% of my total credit because it don't always have work. Oh yeah one trick to force your boinc to dowload the most possible LHC workunits is to suspend all other projects, update on LHC, then resume all the other projects. Sometimes on a one or two day window when LHC has work, boinc dont want to download any. THe suspend-download-resume trick works really well and LHC's short deadlines means they get crunched first, and I don't care if I can't complete all the QMC work on time. On all other projects except the CPDN ones only the alienware laptop is set to MAX. I also shut down (suspended) QMC because they lie to BOINC about crunch times, making your boinc crunch nearly 2 weeks straight on NDF mode even on a non-maxCache machine (their workunits take 40 hours not 15 or 20). I might turn QMC back on in a few weeks but if they send 8 40-hour work units again i probably will abort most of them and only crunch 2 or 3 of them. Ufuids is having problems and is only serving a masters thesis. Their database crash made me lose out on credit for 12 WUs - 5+2+5 for each machine - and they take 15 hours each to crunch. My... LinkSite | Blog | Pictures |
Send message Joined: 2 Sep 04 Posts: 165 Credit: 146,925 RAC: 0 |
My cache is set to MAX on all 3 machines for LHC because it usually don't have work and it takes less than 2 days for it to run out and 1 machine is usually not connected (alienware laptop), my old dell is on crappy internet (semibraodband cellphone modem competing with file sharing), and the other one is at work so its got a good connection. LHC is at 15% for me but its struggling to reach 10% of my total credit because it don't always have work. Oh yeah one trick to force your boinc to dowload the most possible LHC workunits is to suspend all other projects, update on LHC, then resume all the other projects. Sometimes on a one or two day window when LHC has work, boinc dont want to download any. THe suspend-download-resume trick works really well and LHC's short deadlines means they get crunched first, and I don't care if I can't complete all the QMC work on time. You only get one cache setting per venue. You do not get a separate cache setting per project - The General Settings are for all projects. With the more recent versions of BOINC the problem with QMC would also resolve itself after the first batch of work. BOINC WIKI |
Send message Joined: 1 Sep 04 Posts: 275 Credit: 2,652,452 RAC: 0 |
I think the best way to crunch for this project is to keep a small queue and a high resource share. That when there is work available your hosts will switch over and run mostly LHC work until the well dries up again. The project can set a standard defferal so they are not swamped with requests in between. When the project is responding the automatic defferalls should not go over 4 hours. It is only when the project does not respond at all that the automatic defferalls get huge. A large queue reduces turnaround. Although there will always be lost tasks that increase the delay at least they are unintentional. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
I love LHC, and I realize it's different from the other BOINC projects in that it doesn't have continuous work to send out. It sends out work, and analyzes those results before sending out the next batch. Sorry, this is said often and has been answered often. It is just plain untrue, in my opinion. This project, more than any other, is very familiar with the use of the deadline setting. In the past I have known LHC give out WU as long as 15 days and as short as 3 days, sometimes at the same time. If the scientists needed quick-round they'd set a 3 day deadline, or even shorter. If they are not particularly bothered theyd set a 15 day deadline or even longer. If they know they will come back and analyse the results in about three weeks they set a 7 day deadline, as they did here. Please drop the fanasy that there are teams of CERN scientists poised biting their pencils waiting for the work to come back. That is not how they operate (or not usually - when it is they will set a very short deadline). In fact they have *other* *work* to do, and whether the WU are sat in a cache or sat back on the server will not affect the timing of their analysis. I have said this several times before, and asked to be corrected by a member of the project team if I am wrong. Until someone working for LHC says I am wrong, I will continue to assume the above analysis is correct. There is a separate argument. You think it is unfair for some people to use their cache to grab (say) 7 days work when those who download as they go get only (say) 3 days work from a batch. That is a reasonable argument, on which I'd be neutral - neither for nor against. If a reduction in deadlines was applied for that reason, I'd accept that cheerfully as a response to user demand. Even better, in my opinion, would be a change to the scheduler code so that when a WU is re-issued (after an error, or after user abort, or after timeout) it could have a shorter deadline. This would mean that where a result needs to be re-sent, it would be prioritised by the receiving client, and refused ("won't finish in time") be a client with a large queue. It would especially benefit projects like Einstein where exactly the quorum is sent out initially, as these suffer most when a result is trashed. But while the project makes the rules the way they do, and unless the project explicitly asks people to keep cache sizes down, then I don't see anything wrong in playing those rules to my best advantage. I assume most other cache-crammers feel the same. River~~ |
Send message Joined: 2 Oct 04 Posts: 9 Credit: 36,319 RAC: 0 |
I like that logic. Until someone from LHC tells me I'm wrong, I'm going assume the moon is made of cheese and that cows can fly. Duck here comes a cow! ----- |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Duck here comes a cow! With the latest bird flu scares, maybe that should read Cower, here comes a duck. ;-) But yes, I do think that the LHC team have a better idea of what they want than some random participant on a mission against other people's crunching strategies. Why would they be the ones to ask about the lunar dairy mining methodology? Or about bovine aviation accords? |
Send message Joined: 2 Oct 04 Posts: 9 Credit: 36,319 RAC: 0 |
If an LHC staff member told me I was wrong, then I'd listen. I was making fun of the fact that you basically said "I'm right until someone from LHC says I'm wrong." I was using that rationale to say "cows could fly until someone from LHC likewise says I was wrong." Meaning, the lack of an LHC member telling you that you're wrong doesn't necessarily mean that you're right - it could also mean LHC scientists don't read the board and haven't even seen your post. ----- |
Send message Joined: 3 Jan 06 Posts: 14 Credit: 32,201 RAC: 0 |
I'd be inclined to believe River is on the right track. The project scientists are well aware of how to manipulate the deadlines to get the work they need. Now, on a semi-related note: I see several work units timing out after a quorum has been met. Will those units be re-sent or is the quorum already good enough? |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Those of us who were here at the time (middle of 2005) will recall the introduction of Chrulle's deadline optimiser, that looks at the base of available computers and their varying turnround times and adjusts the deadline on the fly, with the object of minimizing the processing time for the study. LHC might do well to reduce the initial replication. Since a quorum is reached at three results, around 40% of the work we do is redundant. Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Meaning, the lack of an LHC member telling you that you're wrong doesn't necessarily mean that you're right - it could also mean LHC scientists don't read the board and haven't even seen your post. If it was the first time the point had been made, yes. But, as I said at the start of that post, replies from a similar angle have been made repeatedly over the last 18 months - since before I joined this project in fact, if you go back and read the boards far enough. And secondly, if you go back and read my post again you will see I also gave other supporting arguments for my view. There is no supporting argument to your examples. Clearly you care about the project - and so do I. Though my main motive is co-operative, I also enjoy the competitive edge that the stats give. The combination of co-operation and competition seems a powerful motivator, and yes it motivates me towards both supporting the project and also towards competing for work when it is available. When I am in competitive mode I will compete fairly according to the rules set out by the organisers. For me that includes automated rules like the deadlines, and it would also include abiding by informal requests from the project like "please don't use a cache bigger than X for this project". For me that does not include abiding by requests that come from elsewhere unless/until those requests are endorsed by the project. River~~ |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Do we know that that is still in use? I assume it is My idea was to use differential deadlining *within* a WU - later results in the same WU would have shorter deadlines than the earlier ones - in contrast to Chrulle's which basically set a deadlines that vary between WU, but remain the same length for each result within a given WU. The project team would set two deadline lengths, and the database would need to hold two deadline values where at present it holds just one. The first result issued for each WU is given the longer deadline, and so do all the results till we get to the initial replication figure. Results issued later (after timeouts, errors, etc) would be given the shorter deadline reflecting the greater urgency caused by the delay. The code that grows new results knows how many results have been made already (it must in order to enforce the maximums), and I imagine it could easily select one deadline length or the other according to whether this is less or more than the initial replication figure. Harder (I imagine) would be actually making that extra column in the database and the extra field in the admin interface so that the admins could fill it in. River~~ |
Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
Why set deadlines at 7 days when you want them back in 4? If the project allows 7 days then as long as the results are returned before their deadlines then there is no argument. MalariaControl.net has their deadlines set at about 2.5 days since they want their results back so that they can develop new wu's based on the returned results. It's easy really. Live long and crunch. Paul (S@H1 8888) BOINC/SAH BETA |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
My idea was to use differential deadlining *within* a WU - later results in the same WU would have shorter deadlines than the earlier ones - in contrast to Chrulle's which basically set a deadlines that vary between WU, but remain the same length for each result within a given WU. All the time the replication figure is higher than the quorum this isn't needed. Since quorum can be reached before all results are returned shortening the deadline on later results will affect completion on only a very few units. For example, I had around 20 results in my cache when the last batch ran out of work. I checked the results table: around 75% of the results I had yet to crunch related to workunits that had already reached quorum. Deadlines notwithstanding, in the last 36 hours of work only 25% would actually contribute anything. As a result I have now shortened my cache from 3 days to .5 day, so that I get results back quickly. This way most of my results now go back in just a few hours rather then seven days. Gaspode the UnDressed http://www.littlevale.co.uk |
Send message Joined: 26 Sep 05 Posts: 85 Credit: 421,130 RAC: 0 |
Now, on a semi-related note: I see several work units timing out after a quorum has been met. Will those units be re-sent or is the quorum already good enough? They wouldn't have to be, so I wouldn't expect them to be. As is, the 4th unit is a work around for people who tras units, so a quorum could still be reached, and the unit could still be validated. In this way and to my understanding, if the 4th unit comes in it gets credit, but if it doesn't, that's it. Unless, among the 3, a quorum can't be reached, because the results are too different and there is no ability for the validator to determine which return was "correct". In that case, another would be sent until a quorum of results could be reached. As to the queue size, if there was a problem for the project scientists, I'm rather certain that either they would adjust deadlines as has been done from some accounts here, or they, or a moderator would come in mention the prob, and point blank ask people. This might even be announced on the front page. RALPH asks people not to have big queues, which is their need. They also state why "it's a testing environment and we want to test over a variety of different computer configs". It's also like, when akosv's optimized app started showing up on einstein@home... There was no official word, and some assumed that perhaps Bruce Allen couldn't speak, because perhaps parts of the app were copyrighted or something, for which the code couldn't be open sourced... Anyhow, in the end, not only was a beta started based on akosv's optimizations, but when akosv had made recommendations on how they could improve the beta, Bruce Allen doesn't ignore him... But in any case, it's sorta like over there. One person ended up saying back in the earlier days of this and before the announced beta "well it's not officially endorsed, but the project hasn't spoke against it either, and it is validating. Now, don't you think that if Bruce Allen came in here and asked people to stop using it, that each person here wouldn't delete it from their computers, post haste?" If the project has a problem, be it with deadlines and the like, they wouldn't necessarily sit their gritting their teeth in abject silence. They'd more then likely express their concerns and give their request to the volunteers. |
Send message Joined: 27 Jul 04 Posts: 182 Credit: 1,880 RAC: 0 |
Greetings from the States. I believe it is still in use. I do not think they have changed anything since i left.
That might be a good idea. Another good idea would be to give such result a higher priority. So that we are sure they are given out as fast as possible, instead of waiting in the queue. cheers, Chrulle Chrulle Research Assistant & Ex-LHC@home developer Niels Bohr Institute |
Send message Joined: 17 Sep 05 Posts: 60 Credit: 4,221 RAC: 0 |
Thanks for your input Chrulle. Nice to see you visit us once in awhile! |
Send message Joined: 29 Nov 05 Posts: 9 Credit: 1,266,935 RAC: 0 |
It is important to MaxCache LHC becuase it takes only 18-30 hours for them to run out of work once they put it up (looks like they put up 80000-150000 WUs at a time). I seem to be getting an edge by suspsending all projects but LHC not just to get work (until the downloads start), but for the entire 18-30 'work window'. Then you turn all the other projects back on and let BOINC take its time. So you crunch LHC exclusively while theres more work to serve, but after that it doesn't matter. So you crunch just LHC for the 18-30 hours and then when it goes 'out of work', you still have the maximum possible cache as if LHC was the only project registered in BOINC and you turn all the other projects back on (LHC-exclusive is pointless when theres no more work left to serve). My... LinkSite | Blog | Pictures |
Send message Joined: 19 May 06 Posts: 20 Credit: 297,111 RAC: 0 |
I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process. I'm just getting back into these sorts of projects after a break when my (old) computer really couldn't handle it. I'm new to this project, and I must say that I was definitely surprised to find people with lots of work units queued up in progress (I think some had 100-200!) while I was only able to get 3 units. And I actually changed my network connection time near the finish of my first unit to make sure I could get another one or two more before they ran out. I was afraid I wouldn't get any more if I waited, and apparently rightfully so. It is disappointing to have an empty queue and be ready to do work, but not be able to get it because other people are sitting on it. People like me who could do work and help get it done faster aren't able to. I don't know the exact problem/cause or what a good solution is. Maybe deadlines are part of it. Maybe the WU/day limit is part of it (there is one, right?). Maybe etiquette is part of it. Perhaps users can help solve the problem, maybe the people behind the project can do something to help even things out. *shrug* I'll keep an eye on the boards to see what's going on. I hope something can be done. I want to do work for this project when it's here, but feel a bit cheated seeing others sit on a lot of work while my computer goes unused. It feels like I'm being prevented from doing my fair share of the work. A kind of side, related question: what happens to units that miss the deadline if there is no quorum? |
©2024 CERN