Message boards : Number crunching : I think we should restrict work units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
Dronak
Avatar

Send message
Joined: 19 May 06
Posts: 20
Credit: 297,111
RAC: 0
Message 13873 - Posted: 4 Jun 2006, 6:02:36 UTC - in response to Message 13862.  

I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process.

That doesn't help the project - that's greed by people who want the most LHC units.


Thanks to your clear explanation, I raised my cach for .01 to 10 days.
And yup,
As soon as there was work to do, I was able to get a bunch of it to work on.


I'm sure MattDavis or someone will correct me if I'm wrong, but I thought the original post, quoted in part here, was saying that you *shouldn't* max out your cache. Doing that means you get a lot of work, true. But it also means that the work gets done slower because you're sitting on work that other people with a lower cache (getting work as they complete it) could be doing. Leaving some computers dry is not the best way to get work done promptly. It slows down the process and makes everyone wait longer to get more work. Wasn't that the whole point behind the original post and subject of limiting work units? To make sure that everyone gets a fair share, not to have some people hogging work for themselves while others' computers get left dry?
ID: 13873 · Report as offensive     Reply Quote
Philip Martin Kryder

Send message
Joined: 21 May 06
Posts: 73
Credit: 8,710
RAC: 0
Message 13876 - Posted: 4 Jun 2006, 9:10:04 UTC - in response to Message 13873.  

I notice that this is slowed down by a minority of users who set their caches to maximum. When the number of work units available hits zero, we still have to wait a week or more while the people who grab a maximum number of units empty their cache before the scientists can even begin the analyzing process.

That doesn't help the project - that's greed by people who want the most LHC units.


Thanks to your clear explanation, I raised my cach for .01 to 10 days.
And yup,
As soon as there was work to do, I was able to get a bunch of it to work on.


I'm sure MattDavis or someone will correct me if I'm wrong, but I thought the original post, quoted in part here, was saying that you *shouldn't* max out your cache. Doing that means you get a lot of work, true. But it also means that the work gets done slower because you're sitting on work that other people with a lower cache (getting work as they complete it) could be doing. Leaving some computers dry is not the best way to get work done promptly. It slows down the process and makes everyone wait longer to get more work. Wasn't that the whole point behind the original post and subject of limiting work units? To make sure that everyone gets a fair share, not to have some people hogging work for themselves while others' computers get left dry?



hmm - You mean that there may have been unintended consequences from starting this thread?

Even so, I'm thankful for the idea.



ID: 13876 · Report as offensive     Reply Quote
Osku87

Send message
Joined: 2 Nov 05
Posts: 21
Credit: 105,075
RAC: 0
Message 13878 - Posted: 4 Jun 2006, 14:10:38 UTC - in response to Message 13876.  

Thanks to your clear explanation, I raised my cach for .01 to 10 days.
And yup,
As soon as there was work to do, I was able to get a bunch of it to work on.

Please, don't do that.

Engouraging people to keep that kind of a cache without a proper reason on fast computers will bring only a problems. As we have seen servers at Cern (or where ever they are) won't stand the excessive amount of downloads what around 80 computers downloading WUs for ten days cause.

So don't come here whining when server is down and you wan't download work. You know the reason exactly.
ID: 13878 · Report as offensive     Reply Quote
Philip Martin Kryder

Send message
Joined: 21 May 06
Posts: 73
Credit: 8,710
RAC: 0
Message 13882 - Posted: 4 Jun 2006, 20:25:45 UTC - in response to Message 13878.  

Thanks to your clear explanation, I raised my cach for .01 to 10 days.
And yup,
As soon as there was work to do, I was able to get a bunch of it to work on.

Please, don't do that.

Engouraging people to keep that kind of a cache without a proper reason on fast computers will bring only a problems. As we have seen servers at Cern (or where ever they are) won't stand the excessive amount of downloads what around 80 computers downloading WUs for ten days cause.

So don't come here whining when server is down and you wan't download work. You know the reason exactly.


So, you are saying that you have direct knowledge that the servers are so badly configured that they FAIL rather than throttle back connections to a level that they can handle?

Can you tell us where and how you learned this?

If your assertion is true, shouldn't be handled more directly by reconfiguring the servers rather than expecting 65k crunchers to configure their machines in some special way?

And by the way, I was never intending to be "...Engouraging [sic] people to keep that kind of a cache..."
I think that YOU should set YOUR cache to .01 and leave it there.
You should NEVER raise YOUR cache above .1.

By the way, what do you mean by "...a proper reason???"








ID: 13882 · Report as offensive     Reply Quote
MB Atlanos

Send message
Joined: 14 Jul 05
Posts: 11
Credit: 81,274
RAC: 0
Message 13890 - Posted: 5 Jun 2006, 18:38:31 UTC - in response to Message 13882.  
Last modified: 5 Jun 2006, 18:53:45 UTC


Please, don't do that.

Engouraging people to keep that kind of a cache without a proper reason on fast computers will bring only a problems. As we have seen servers at Cern (or where ever they are) won't stand the excessive amount of downloads what around 80 computers downloading WUs for ten days cause.

So don't come here whining when server is down and you wan't download work. You know the reason exactly.


So, you are saying that you have direct knowledge that the servers are so badly configured that they FAIL rather than throttle back connections to a level that they can handle?

Can you tell us where and how you learned this?

If your assertion is true, shouldn't be handled more directly by reconfiguring the servers rather than expecting 65k crunchers to configure their machines in some special way?

And by the way, I was never intending to be "...Engouraging [sic] people to keep that kind of a cache..."
I think that YOU should set YOUR cache to .01 and leave it there.
You should NEVER raise YOUR cache above .1.

By the way, what do you mean by "...a proper reason???"







(End of Quote)

First:
Calm down, dont take everthing too personal and stop shouting/make demands at other people who express there dislike with your behavior.
Your reaction also is not very constructive - please dont try to be sarkastic, its not your best skill. ;)

Second:
Search the forum, there are several reports that to many connections have kicked the DB-driven-websites and also the forum to nirvana. IIRC 50+ became critical, we see that a few days ago at the last batch of work.

If you not noticed:
In the last few months the normal LHC-User see not very much activity by an admin, currently there are no projektadmin at all. Look at the appropriate forumposts and the anwsers from chrulle, your former admin.

Oh, one more thing: please stop to waste space in your quotings, not everyone likes unnecessary scrolling. ;)

Sidenote: before you go ballistic at this post - observe the emoticons.
ID: 13890 · Report as offensive     Reply Quote
Osku87

Send message
Joined: 2 Nov 05
Posts: 21
Credit: 105,075
RAC: 0
Message 13891 - Posted: 5 Jun 2006, 19:46:16 UTC - in response to Message 13890.  

By the way, what do you mean by "...a proper reason???"

Like being a modem (or other with dial up and minute fee line) user.

I'm sorry if someone took the tone of my message too serious. It was ment to wake some guys up... ;)

As MB_Atlanos said there have been a couple of times when too much of server load has gotten the servers upside down.
ID: 13891 · Report as offensive     Reply Quote
Profile ksba

Send message
Joined: 27 Sep 04
Posts: 40
Credit: 1,742,415
RAC: 0
Message 13897 - Posted: 7 Jun 2006, 22:25:35 UTC

Hi, im not the fastest user, i'm user nr 2 :)
I have something about 100 PCs working at LHC. I have set LHC to 90%, and Rosetta at 10% and i set cache to "0.1 Day".
If there is fresh work, i get 1 WU at normal PC, 2 WU at HT's and 4 at very fast DualCores.
If too many fill the buffers up and let us wait many many days, its a shame.
If you have a flat and are allways online, please set it as low as possible!!

I'd like to get work at the second day too, and not only on the first 5 hours :)
Do i need to set it to 5 Day's (joke!)
ID: 13897 · Report as offensive     Reply Quote
Profile David Lahr

Send message
Joined: 27 Dec 05
Posts: 7
Credit: 461,367
RAC: 0
Message 13898 - Posted: 8 Jun 2006, 5:26:35 UTC - in response to Message 13897.  

Well, I'm sold, consider my cache increased to maximum size!

ID: 13898 · Report as offensive     Reply Quote
Profile John Hunt

Send message
Joined: 13 Jul 05
Posts: 133
Credit: 162,641
RAC: 0
Message 13910 - Posted: 9 Jun 2006, 22:05:33 UTC

As I post this (23.00 hrs UK time) the front page of the LHC site shows that there are still 340 WUs out there somewhere, still unprocessed. This is days after I (and I'm sure many other people!) had returned the final WU unit from the last batch of WUs to be issued.

So those 340 WUs are being held in the cache of some irresponsible crunchers who 'hoard' work. This must ultimately slow down the whole LHC project!

Jeez - when will some people learn that we are working for LHC, not LHC working for our egos.....


ID: 13910 · Report as offensive     Reply Quote
Philip Martin Kryder

Send message
Joined: 21 May 06
Posts: 73
Credit: 8,710
RAC: 0
Message 13914 - Posted: 10 Jun 2006, 7:15:42 UTC - in response to Message 13910.  

As I post this (23.00 hrs UK time) the front page of the LHC site shows that there are still 340 WUs out there somewhere, still unprocessed. This is days after I (and I'm sure many other people!) had returned the final WU unit from the last batch of WUs to be issued.

So those 340 WUs are being held in the cache of some irresponsible crunchers who 'hoard' work. This must ultimately slow down the whole LHC project!

Jeez - when will some people learn that we are working for LHC, not LHC working for our egos.....



how do you know that the "hoarders" didn't have a "proper" reason such as a slow modem or dial up line?


ID: 13914 · Report as offensive     Reply Quote
Profile John Hunt

Send message
Joined: 13 Jul 05
Posts: 133
Credit: 162,641
RAC: 0
Message 13916 - Posted: 10 Jun 2006, 8:15:24 UTC - in response to Message 13862.  

Matt - I want to thank you for taking the time to post this and start this thread.

Prior to your having done so, I was have difficulty getting work units to run for LHC.

Thanks to your clear explanation, I raised my cach for .01 to 10 days.
And yup,
As soon as there was work to do, I was able to get a bunch of it to work on.

Again, thanks for your help in showing us how to get the maximum number of work units to process.

Phil



'Nuff said.......



ID: 13916 · Report as offensive     Reply Quote
Profile Trog Dog

Send message
Joined: 25 Nov 05
Posts: 39
Credit: 41,119
RAC: 0
Message 13917 - Posted: 10 Jun 2006, 10:28:33 UTC - in response to Message 13914.  



how do you know that the "hoarders" didn't have a "proper" reason such as a slow modem or dial up line?



or an older machine, or one that doesn't crunch 24/7?
ID: 13917 · Report as offensive     Reply Quote
Dronak
Avatar

Send message
Joined: 19 May 06
Posts: 20
Credit: 297,111
RAC: 0
Message 13918 - Posted: 10 Jun 2006, 14:15:28 UTC - in response to Message 13917.  



how do you know that the "hoarders" didn't have a "proper" reason such as a slow modem or dial up line?



or an older machine, or one that doesn't crunch 24/7?


That's possible, sure, and I don't think people would complain much if people have a legitimate reason for using a large cache. One way to check this would be to find some work units that are still pending, then look at the computer's details. Some of the information there must give you an idea what the computer is like in terms of age, speed, percent of the time BOINC runs, network connectivity, etc. That should help you decide if the person is keeping a large cache for a legitimate reason or not. The on and off work status of LHC is new to me, so it's tough for me to understand exactly what's going on, why, and how to solve the problems. I do know that it's disappointing to see so many work units in progress, waiting to be done, while my computer has been sitting dry for over 1 week. I think many others share this feeling, and that's one of the reasons for the complaints. We suspect that not everybody has a proper reason for maintaining a huge cache of work, and the people who don't have a proper reason are just being greedy, taking work from others for themselves and preventing work from being done promptly.
ID: 13918 · Report as offensive     Reply Quote
Philip Martin Kryder

Send message
Joined: 21 May 06
Posts: 73
Credit: 8,710
RAC: 0
Message 13919 - Posted: 10 Jun 2006, 15:04:15 UTC - in response to Message 13918.  




While I can acknowledge and appreciate your feelings, I'm surprised to find them so common in a scientific study.

Instead of casting wide nets of guilt by inuendo, based on intuition and emotion, why not gather the data that you describe?

Some folks might even consider gathering data to support a hypothesis **before** posting.

As for deciding based on the attributes of the machine or network if any given cache size is "legitimate", who am I or who is anyone to decide what "legitimate " is?

I denigrate the underlying premise that somehow "greed is bad" but having a slow connection or slow computer is somehow makes it "proper" to have a large cache.
A person who was less "greedy" might buy a faster computer or a faster Internet connection or a second phone line.

Would that then make them "good"?

What if they bought a larger computer and then increased their cache, would they then be "greedy" or would they be "good?"


In summary:
1) LHC could control hoarding by limiting deadlines.
2) LHC could control server crashes by limiting max connections relative to known server capacity.
3) If the use of large caches is "improper" or "illegitimate", then this is the type of feedback that the BOINC folks need in order to establish the need for new and better controls and algorithms in BOINC.
4) There is lots of other great science to do on other projects when LHC runs dry.

crunch on!




how do you know that the "hoarders" didn't have a "proper" reason such as a slow modem or dial up line?



or an older machine, or one that doesn't crunch 24/7?


That's possible, sure, and I don't think people would complain much if people have a legitimate reason for using a large cache. One way to check this would be to find some work units that are still pending, then look at the computer's details. Some of the information there must give you an idea what the computer is like in terms of age, speed, percent of the time BOINC runs, network connectivity, etc. That should help you decide if the person is keeping a large cache for a legitimate reason or not. The on and off work status of LHC is new to me, so it's tough for me to understand exactly what's going on, why, and how to solve the problems. I do know that it's disappointing to see so many work units in progress, waiting to be done, while my computer has been sitting dry for over 1 week. I think many others share this feeling, and that's one of the reasons for the complaints. We suspect that not everybody has a proper reason for maintaining a huge cache of work, and the people who don't have a proper reason are just being greedy, taking work from others for themselves and preventing work from being done promptly.











ID: 13919 · Report as offensive     Reply Quote
KWSN - A Shrubbery
Avatar

Send message
Joined: 3 Jan 06
Posts: 14
Credit: 32,201
RAC: 0
Message 13921 - Posted: 10 Jun 2006, 16:53:38 UTC - in response to Message 13914.  

As I post this (23.00 hrs UK time) the front page of the LHC site shows that there are still 340 WUs out there somewhere, still unprocessed. This is days after I (and I'm sure many other people!) had returned the final WU unit from the last batch of WUs to be issued.

So those 340 WUs are being held in the cache of some irresponsible crunchers who 'hoard' work. This must ultimately slow down the whole LHC project!

Jeez - when will some people learn that we are working for LHC, not LHC working for our egos.....



how do you know that the "hoarders" didn't have a "proper" reason such as a slow modem or dial up line?


Proper reason or not, the WU that are currently being processed are ones that had their origional deadline exceeded. Even the last WU sent from the recent batch would have timed out by now.

Granted, things happen and sometimes it's not possible to return results but I seriously doubt this is the case with the majority of the current backlog.

I don't have an issue with people having enough work to keep them running but manipulating the program to the point of work missing the deadline is excessive.

ID: 13921 · Report as offensive     Reply Quote
m.mitch

Send message
Joined: 4 Sep 05
Posts: 112
Credit: 2,068,660
RAC: 379
Message 13922 - Posted: 10 Jun 2006, 17:23:00 UTC
Last modified: 10 Jun 2006, 17:23:49 UTC


I don't know. I think it all may be a bit harsh. I fit in to the category of un-returned results. It happens. BOINC restarted on one of my PC's and hung for a while then came up empty. I shut it down and it seemed to restart okay. But latter when checking LHC I found a number of unfinished WU's that should have timed out by now.

It wouldn't be the only scenario and it would take to many users for that problem to add up but that's why LHC have so much redundancy. If it were a problem the project would be modified.

I'm doing the best I can, sometimes I make mistakes.




Click here to join the #1 Aussie Alliance on LHC.
ID: 13922 · Report as offensive     Reply Quote
Profile Steve Cressman
Avatar

Send message
Joined: 28 Sep 04
Posts: 47
Credit: 6,394
RAC: 0
Message 13931 - Posted: 11 Jun 2006, 5:42:11 UTC

If we were not still waiting for the stragglers to get the work done we could possibly be already working on another batch. It is not right that there are still units to be done when others attached to the project have been idle for a week or more.

I think a solution that the project could do is to decrease the daily quota. And I would bet that when the right number for the quota was found they would get all the results much faster then they do now. It might take a bit of trial and error for them to find the right amount to set it at but it most definitely would be better then have the majority of the host computers sitting idle asking for work. When I say idle, I mean in reguards to this project because hopefully this is not the only project they are attached to.
98SE XP2500+ @ 2.1 GHz Boinc v5.8.8
ID: 13931 · Report as offensive     Reply Quote
Profile Trog Dog

Send message
Joined: 25 Nov 05
Posts: 39
Credit: 41,119
RAC: 0
Message 13936 - Posted: 11 Jun 2006, 7:36:46 UTC - in response to Message 13931.  



I think a solution that the project could do is to decrease the daily quota.


I can see it now all those super crunchers complaining that they have hit the daily limit (refer to the wailing and gnashing of teeth about the curse of 32 over at Einstein) and how its slowing down the science having their machines sitting dry. It's not fair that their super machines are being penalised because they are too fast.

Whichever way you cut it, slice it or dice it, the work is sporadic someone somewhere will always run out of work before someone else.
ID: 13936 · Report as offensive     Reply Quote
bowlingguy300

Send message
Joined: 1 Sep 04
Posts: 14
Credit: 3,857
RAC: 0
Message 13939 - Posted: 11 Jun 2006, 12:01:57 UTC

spreading the project around to EVERYONE, would speed up the process so much I dont understand why they would allow this.
a small amount of people take all the work units and we (the ones that dont have thier machines turned up to recieve more then they should), have to wait a week for them to finish all that they have. how can they say that they are faster then having all of us doing the work? you can only process one workunit at a time per machine, and so many are sitting...waiting... without!
so come on guys, please turn your stuff back to default and let US ALL DO THIS PROJECT! Stop the Greed! and Share!
ID: 13939 · Report as offensive     Reply Quote
Profile 1fast6

Send message
Joined: 2 Jun 06
Posts: 5
Credit: 245,858
RAC: 0
Message 13940 - Posted: 11 Jun 2006, 14:03:37 UTC
Last modified: 11 Jun 2006, 14:07:06 UTC

well, I joined the project over a week ago, just to watch all of my machines sit idle... very frustrating...

I wonder if the admins would consider setting it up to continuously release work, and the first 3 that are returned that validate get full credit(I assume this project uses a triple validate for credit system) .. the stragglers get less (or is that a boinc controlled issue)... it would seem that would get them the results they need quicker and give us all something to do, while rewarding those who get the work in quickest (and not rewarding those who hoard work)...

its painful to watch the servers wait for a timeout for less than 30 workunits to be returned...
ID: 13940 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : I think we should restrict work units


©2024 CERN