|
21)
Message boards :
Number crunching :
Because you asked....
(Message 16369)
Posted 18 Feb 2007 by River~~ Post: I would suggest that you need to identify and cure the cause of the proliferation of hostids before you do the real migrate, and in addition to prune out the millions of unwanted hosts from the db before the real migrate, perhaps only migrating those hostids that have ever submitted work. It would be possible to transfer this bug across and then fix it later, but, in my opinion, that would cause you more difficulties in the long run than solving the issue first. Having a wildly excessive number of hosts in the db certainly skews any optimisations you may try to test, and may well cause other bugs that would not arise with a plausible number of hosts. One bug which may (or may not) be a knock on from the mega-ghost-host issue is the stats export issue. Stats export is not going to be high up the list of practical priorities, so I don't anticipate you will spend a lot of time on stats export itself. On the other hand stats come high on the personal priotirites of many crunchers and if a solution to that came easily out your ghostbusting of the mega-hosts then many crunchers would be delighted, and there would be a corresponding saving in winges on these boards... Best regards, River~~ |
|
22)
Message boards :
Number crunching :
controlling a linux client
(Message 16366)
Posted 17 Feb 2007 by River~~ Post: There is a nice manager fo Linux called KBoincSpy. I like it better than the manager that comes with Boinc. and FluffyChicken on Rosetta points out that There are also quite a few other options, one of them being KBoinc Manager, so no need to use another Windows box or WINE (or similar WINE clones) to get BOincView capabilities |
|
23)
Message boards :
Number crunching :
Please note: this project rarely has work
(Message 16365)
Posted 16 Feb 2007 by River~~ Post: ...(even as a mod I can't edit other people's posts which is a little strange) As a mod you can delete (which does not really delete, just hides them from all but mods and other superusers). To get the effect of editing someone elses's posting, delete it, reply to it, alter the quote as desired, and leave a note at the end saying why it was moderated (eg bad language removed/added/etc) I understand the reason for this arrangement is to stop evil mods from editing other poeple's work in ways that they would not like. Thanks for stickying this thread. River~~ |
|
24)
Message boards :
Number crunching :
controlling a linux client
(Message 16350)
Posted 15 Feb 2007 by River~~ Post: There is a misconception, which I suffered from at one time that linux BOINC commits you to working in command line mode. If doesn't, though you do need a little command line Unix to get things running when you download the official distro. But after that it can be just as 'graphical' as on Windoze. For example, Alex recently said:
I was glad to discover some time ago that I was wrong in this, and am making this posting to help anyone else who is wondering about this. There are a number of options 1. Run the official BoincManager from KDE or Gnome. The official download of the client includes a BM that works with both those desktops. My suggestion if you use this approach is to put a link ('shortcut') to the BM on your desktop. 2. Run BoincView under Wine as Alex suggested 3. Run BoincView from a nearby Windows box (my own solution) (eg if you are already running a mixed network) 4. Run the windows BoincManager on a nearby windows box and connect remotely. (ever wonder why you can load multiple instances of the BM? It is for the very few users who have one BM running for each of their other machines. I used to do this myself till I discovered option 3) 5. Run the linux BoincManager on a nearby graphics-enabled linux box. 6. Run BoincView under Wine on a nearby linux box. Solutions 3 - 6 all work even with a command-line only linux box - you don't even need an X-server on the box running BOINC as all the gfx happens on the 'viewing' box. 'Nearby' means that the box running BM or BV has to be able to address the box running the client. The viewer can be inside a firewall compared to the client, but not the other way round. In principle a BOINC client on a box with a public IP address (or with port forwarding) could be controlled from the other side of the world. Solutions 1, 3, and 4 have been used by me on a long term basis and work well. Alex uses option 2. Finally, suggestions 3-6 will all control windoze boxes as well. BV is available from http://boincview.amanheis.de/. For some people the fact that it is not open source is an issue, and certainly means there will never be a native Linux BV :( The BV binaries for windows are freeware and in my opinion if you have more than one box you'd be better off with BV than BM. I have no idea if BV is Vista-ready. Hope that helps. River~~ |
|
25)
Message boards :
LHC@home Science :
Close Account
(Message 16349)
Posted 15 Feb 2007 by River~~ Post: That just detaches the client. I don't think it removes the account information from the server. correct, and you can change your email address to something non-existent but with plausible syntax. Also I'd suggest deleting your boxes if possible (which means waiting till all the work from them is deleted, which will be a long wait on LHC). Which is as far as you can go in'closing' your account. The project keeps the a/c id forever
there are at least six ways to use a gui to control a linux box, please see this thread Hope that helps. R~~ edit: moved gui info to new thread on NC |
|
26)
Message boards :
Number crunching :
LHC credits not showing up in cross-project stats
(Message 16348)
Posted 15 Feb 2007 by River~~ Post: Where is my pony? :*( it doesn't work if you tell your wish before you get it |
|
27)
Message boards :
Number crunching :
Fairer distribuiton of work(Flame Fest 2007)
(Message 16347)
Posted 15 Feb 2007 by River~~ Post: If I run any other application (Einstein or Rosetta for example) then the LHC aplication stops looking for work completely. Even if it had thought it was a 1 minute interval to the next "poll" - nothing - for hours. If I suspend the running applications LHC still doesn't look for work - I have to do an update on the project screen to get it back into it's periodic polling. There are four different things going on here, possibly in some combination. The first is the normal polling backoff. 1. For a few times it tries at 1min intervals, but then slowly degrades in random steps (so it can sometimes get quicker for one or two tries) till it is trying at random intervals with a max of 4hrs. This process is unaffected by other work on the box. Please do not interfere with this process, when people do this in significant numbers it kills the server - that is why there is a backoff. 2. The second is the cache size & no work fetch logic. This varies from version to version of the BOINC client, but the client will not ask for work when it thinks your box already has enough. That is why I recommend a cache roughly equal to one work unit of your other projects - except CPDN ;-) 3. If LHC downloaded a lot of work last time it had some, and then ran in earliest deadline first to get it finished, it may be in negative debt mode. You can look for your debt in the client_state.xml file or with a sutiable debt reader program (remind me where to download it from please someone). 4. If the cleint could not contact LHC for five times running, it then backs off for either 24hrs or 1 week depending in the client version. This backoff is to prevent network load on a server returning from downtime. If you see this message in the message window, please check you can load the LHC front page and that it shows a plausible (possibly zero) value for work in progress - if you can't see the LHC front page or it complains of too many connections then please wait a while before trying the front page again. Please don't retry more than two or three times a day in this case or you become part of the problem. On the other hand if the long backoff is because of a known network problem local to yourself, then by all means retry after verifying that the LHC front page is OK. As you say, you can retry by clicking update. How do you know which is which? Well in cases 2 and 4 there will be no effect in suspending all other projects, wheras in cases 1 and 3 suspending all other projects will trigger an update request within a few minutes (not always immediately). You tell 2 and 4 apart by looking at the message window messages. You tell 3 by a message 'No Work Fetch' (remind me which window this is in please - in BoincView it is in the hosts window under network status, but I never use the official manager so can't remember where this message appears). Hope that helps R~~ |
|
28)
Message boards :
Number crunching :
LHC credits not showing up in cross-project stats
(Message 16306)
Posted 12 Feb 2007 by River~~ Post: it's S@H and CPDN that appear {actually, they don't appear} to be missing their updates ... no change for ages, now If it is a CPID issue that way to fix it is to have at least one box that has all three projects on it, and let it run all three projects till they converge on a single CPID. When that happens BoincStats will either show all the projects or none. If none, the you have to go into the search page and find your new ID. I know that some other stats use the same CPID mechanism so this advice might be relevant. Or might be a total red herring if your stats site does things differently. |
|
29)
Message boards :
Number crunching :
Machine with a 40 Day Cache that doesn't Timeout?????
(Message 16305)
Posted 12 Feb 2007 by River~~ Post: It is very easy for someone to alter the DCF in the client_state.xml file with the result that boinc thinks each wu may only take minutes and not hours, which results in many many wu's being downloaded. The DCF will be reset to the correct number after the first wu is completed. I wish you hadn't said that. In general, if someone is going to cheat like mad at least let them do the intellectual bit for themselves... The same effect can happen naturally if you get a run of very short jobs - which I always think is delayed karma for the disappointement of downloading 8 hours work and seeing it all complete in a few seconds. I agree with Gary to get 40 days worth would be a bit over the top for that mechanism, Ive had 2x or 3x my cache by the workings of this karma but don't think I have seen more than that. R~~ |
|
30)
Message boards :
Number crunching :
"Dummy" Work units complete after less than 0,1sec ?
(Message 16304)
Posted 12 Feb 2007 by River~~ Post: Hi experts, Hi Jochen, I think you were just unlucky - LHC gives out work that is very short sometimes but not as a deliberate 'dummy' run. As you know the software looks at how stable the orbits of the particles are in the simulated LHC, and it does this by following the particle round 100,000 or 1,000,000 turns of the machine. Sometimes the particle hits the wall on the first or second turn. Then there is no point carrying on. Unfortunately, jobs come in series with similar parameters, so when you get one of these short runs you tend to get several in the same download. How you could tell for sure is that a real dummy job would have an estimated time to completion of a few seconds before it was run; wheras a real job that runs short would have a more reasonable estimate, but then finish well before the estimate. Hope that helps, and better luck next time! R~~ |
|
31)
Message boards :
Number crunching :
Machine with a 40 Day Cache that doesn't Timeout?????
(Message 16262)
Posted 6 Feb 2007 by River~~ Post: How to solve this isn't a simple problem either as you may know and hopefully it will be tackled very quickly once we have the service fully under control. I'd point you to the suggestions made in the fairer distribuition [sic] of work thread, including some of mine and of John Keck. Either set of suggestions would (in my partisan opinion) do more to even things out than sorting out this particulalr bug here, because they would spread the work out more fairly amongs the users asking for it. So somebody trying delibeately to get a weeks work at one go would not be able to, let alone getting 40 days work. But I am sure you will consider all suggestions and go away and implement some of them and implement some other even better ideas of your own... Good luck and you have my full support even if you don't accept any of my points. R~~ |
|
32)
Message boards :
Number crunching :
That was fast
(Message 16254)
Posted 6 Feb 2007 by River~~ Post: Weird. I too find that the world is a much better place when I rely on my assumptions. All the problems only start when this disillusionment stuff comes in ... |
|
33)
Message boards :
Number crunching :
Machine with a 40 Day Cache that doesn't Timeout?????
(Message 16253)
Posted 6 Feb 2007 by River~~ Post:
Well, I reckon lower than getting the code working on the new Debian servers (ie this bug waits till after the code is working at least as well as on the CERN servers), lower than getting the ghost host issue sorted as this is creating problems with the size of the db, lower than getting some kind of issue limit working as if people only got a limited number of WU then the damage done by all kinds of abuse would reduce rather than just fixing a single kind, lower than export of stats bcause so many people *really* want that, and arguably even lower than getting a second app going here (as then there will be plenty of work for everyone and people will not be tempted to exploit loopholes) I would change my mind if there was evidence of folk exploiting it en masse and not exploiting other loopholes. btw, I feel the need for a quick disclaimer: I am only expressing a view, of course, and so is Gary. Neither of us is going to be offended if N & A take a different view on the priorities. Ive made my points here (and perhaps Gary will refute them all) in the hope that my thoughts will be helpful to the new admins, not to tell them their job. R~~ |
|
34)
Message boards :
Number crunching :
Machine with a 40 Day Cache that doesn't Timeout?????
(Message 16241)
Posted 6 Feb 2007 by River~~ Post: ps (too late to edit) It is also worth saying that, in my opinion, this was almost certainly not a deliberate exploit. For one thing, only one of this users boxes seems to have been affected, and some of his other boxes have got and returned reasonable amounts of LHC work since. The run length of work on this project is only loosely connected with the estimates from the server - my boxes tend to have DCFs (duration correction factors) of around 1.7 to 1.9 most of the time, but sometimes there is a run of short-running work and every so often one box will slip to a figure well under 1. This means that it will pick up more than my intended cache size next time it fills up, as the chances are the next work will put it back to a DCF of 1.7. R~~ |
|
35)
Message boards :
Number crunching :
Machine with a 40 Day Cache that doesn't Timeout?????
(Message 16239)
Posted 6 Feb 2007 by River~~ Post: ... [quibble] You might in theory get credit if you get the work back before the replacement work is returned. For this to happen after a month would need 3 or 4 other machines to have been issued the work in succession without returning it. The reason for this being allowed is that the user gets credit for a late result if it made a practical difference to the project. [/quibble] However that is not what is happening here, looking at just this one WU for example, four results went back by 1st Jan, and SoulFly's result went back on 25th. It seems to me that they have switched off the test for a result being past the deadline. I think I know why - there was a bug in that test and around Oct? Nov? 2006 people were complaining that work sent back after quorum was being scored zero for alledly late even when it was not. Looks like they either made a mistake fixing that bug, or more likely just turned the test off entirely not having time to look at it properly. If I am right, then chalk up another low priority task for our new admins. Good call Gary - & its good to be seeing your posts agian River~~ (you may remember me as Gravywavy on Einstein) |
|
36)
Message boards :
Number crunching :
That was fast
(Message 16230)
Posted 5 Feb 2007 by River~~ Post: another spill just minutes ago and i didn't get some either. and I got just 1, on a third box, making a totla of 22 WU today, so 3/11 boxes got lucky in these 24hrs so far and avg over all my boxes is 2 wu each today. The thing about having lots of slow boxes (11 boxes all 0.5 to 0.866 GHz) is I don't get many but I usually get some... and in the meantime they keep my lounge nice and warm R~~ edit 0.5 MHz -> 0.5 GHz :) |
|
37)
Message boards :
Number crunching :
@ new admins
(Message 16226)
Posted 5 Feb 2007 by River~~ Post: Hi N & A, Many thanks for the info on the project front page. No need to apologise for a fortnight gap between messages on the front page, that is more than we have been getting for a while, and in my opinion is within the acceptable zone, though if you can manage to continue with fortnightly updates till everyting is sorted I know a lot of us would appreciate it. If you want to live adventurously, I'd be curious to know what you intend getting sorted in the next fortnight, and what you are already working on but reallistally feel will take more than a fortnight to do. On a 'no promises' basis of course :) River~~ |
|
38)
Message boards :
Number crunching :
That was fast
(Message 16224)
Posted 5 Feb 2007 by River~~ Post: I got nothing, cause i was sleeping and not aware that LHC will spill out work again. I hope there will be more soon. I was sleeping too - UTC is my home timezone & I got up in the night as we do sometimes and noticed that BoincView was showing me 2 boxes with LHC. But it all happened without my conscious involvement around 90mins earlier. My advice (should you choose to accept it) is as follows: * sensible cache size (from 0.1 to 0.5 day unless you *really* need longer for a dialup etc) * give LHC a double share of the resources so it is always "starved" by the time work is available. If all other projects are at the default resource level of 100, give LHC 200 for example. If your other projects vary in resources, give LHC double the average. * leave LHC enabled all the time * choose other projects that have run lengths of around your cache size (or on Rosetta choose a run length roughly matcing your cache size) That way if work is on the system continuously for >4hrs you will get a fair share, if work is on the system for <4hrs (as here) you get a fair go at the "lottery" for it. And you will get that work whether you are nursmaiding your boxes lovingly, or in bed pushing up the ZZZs or out on the town. Large caches mean that if you get work you get more, but also mean that your box will go look for work less often, so you'd be less likely to ask at just the right time. Large caches also mean you get heavy flak from those who like to shout about greedy users, whereas little and often gets the same result in the long term and without embarrassment in the forums (fora?) HTH R~~ |
|
39)
Message boards :
Number crunching :
That was fast
(Message 16219)
Posted 5 Feb 2007 by River~~ Post: I was getting "no work from project" up until 01:59:46. Then I managed to grab 60 work units on one machine and by 02:02:51 it was all sold out and back to "no work from project" again. yes lucky you. I got some too, on 2 out of my 11 machines. Usually it is a bit of a lottery getting work whe it is a small release like this one, as the program that puts the work onto the server does so gradually. So there might be 60 jobs waiting, then you get them all, and the next instat someone else gets none, but a few minutes later there are a few more waiting. This explains why my two boxes got work at 2:02:39 and 2:03:10 UTC on the LHC server clock, where one of yours failed to get any at 02:02:51. This kind of effect also explains why the faster machine out of my two lucky ones got less work than the even slower one - at 02:03:10 i emptied the barrel of all the jobs that had arrived since your box looked at 02:02:51 (less any that other people got). 12 jobs in 19 seconds, plus whatever was handed out to others in that 19sec. I am pleased as my boxes have been off for almost a week, and I got them back on only 10hours before the work came :) R~~ edit: add: ps - you may be interested in Scarecrow's graphs - they have a time resolution of 1 hour so miss this knd of fine detail, but are good at seeing the longer term picture |
|
40)
Message boards :
Number crunching :
SOME greedy users
(Message 16102)
Posted 10 Jan 2007 by River~~ Post:
reminds me of this little tale - its off topic so purists can skip over it. The linguistics lecturer was droning on as per usual, and today talking about double negatives. "It is interesting to note that languages are divided on the meaning of a double negative. In some, such as French, a double negative is simply a more emphatic negative, whereas other languges, including English, treat a double negative as a positive. Of course, double positives are always positive, there is no known case of a double positive denoting a negative". At this point a voice from the back of the lecture theatre was heard "Yeah. Right". R~~ |
©2026 CERN