11) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20767)
Posted 14 Nov 2008 by (banished: ID 70524)
Post:
the very least they can do and it would take much from their budget


The least they can do is nothing and I have explained this the budget is set, they asked for money to do certain things and they got the money to do certain things and LHC@home was not in that plan THEY ARE NOT ALLOWED DIVERT FUNDS THAT HAVE BEEN ALLOCATED.


Correction => The very least apathetic sloths would do is nothing. Responsible people would eliminate the waste. Regarding diverting funds... there is someone who can divert funds, there always is. If funds haven\\\'t been diverted then you just haven\\\'t spoken to the right person or else you have but they\\\'re pretending to be deaf. Either way there is a solution and I intend to help. More on that below.


And if they don\\\'t like that then let them buy the CPU time they pee down the drain. I have a hunch things would get fixed real quick if they had to do that.

The work just wouldn\\\'t get done in that case, there is no money for this, no money at all zilch, zero, nada, nothing if they had to buy the CPU this work would not get done, this is not the perfect scenario for the LHC but that is what they would have to do.


You seem to be saying they didn\\\'t know the magnets would require tuning/alignment when they designed the machine so they didn\\\'t budget for that. That would make them idiots which would mean the entire venture has little chance of accomplishing anything. I can\\\'t believe that.

You also seem to be saying there is absolutely 0 contingency planning and funds. If so then how will they pay for fixing the recent coolant system failure?

You seem to think we relish \\\"stealing CPU\\\" and working on a best effort basis with no money.


If that\\\'s what you think then you\\\'ve acquired the wrong impression.

I can assure you this project is very frustrating to work on without you telling me it is (by the way thanks for the reminder I was actually having a good day).


No problem. I was having a good time BOINCing until I came to understand what a wasteful, screwed up project this is aand how it hurts so many other projects that are equally broke but make every effort to be efficient. (One snide remark deserves another, goose<=>gander)

Do you think we have a massive pile of money we control and we are just sitting on it laughing maniacally? If we could make money and more effort appear we would have by now, all we can do is keep bidding for money from various bodies.


I am quite aware of how broke LHC@home is.

If the LHC has spent/allocated every penny and cannot afford to provide for its own needs then it\\\'s doomed to be shutdown soon by some unforseen failure/circumstance because they just won\\\'t be able to afford to fix it. If that\\\'s the case then there\\\'s not much point in continuing the magnet tuning work, is there?. No, Neasan, LHC has money. You know it and I know it. And we both know the only reason they haven\\\'t rubbed a little of it on this project\\\'s problems is because suckers here just keep donating CPU time, blissfully unaware of how much of it goes to waste. Preying on the ignorance of one\\\'s benefactors... is that how honorable men get jobs done?


Yes this what this project does is important but when they look at the list of things to do be done the see more important things, like actually fixing the machine after the failure in September.


The means to end the tragic waste of CPU time has been at hand for well over a year, long before the machine fired up and broke. So your argument is just spin, the shell game I referred to in an earlier post.

Your badgering doesn\\\'t \\\"shame the powers that be\\\" it just winds me up. We know you have misgivings about the IR minimum quorum and we have been talking to the scientists about it the new SixTrack they are working on should also to drop both of these numbers. In this case the squeaky wheel does not get the oil it merely makes the user consider walking


Keep the quorum of 3 if that much assurance is needed, nobody has a problem with that, at least I don\\\'t. Just reduce the initial replication to 3 or whatever the minimum quorum is.

Now back to your rant...

So we have the shaming strategy versus educating thousands of users to the point where they decide to spend their resources on efficient projects rather than LHC@home. Seems to me that if the former does not work then the latter is bound to because benefactors eventually tire of shameless beneficiaries and cut them off. That is the corner LHC has backed LHC@home into. Or maybe LHC@home put itself into the corner. Whatever. There is a graceful way out and LHC@home seems to be finally headed in that direction. Too bad it took the \\\"collapse\\\" of SixTrack under the \\\"burden\\\" of v.6 BOINC API to light the necessary fires under the buttocks of the powers that be but better late than never.

12) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20765)
Posted 13 Nov 2008 by (banished: ID 70524)
Post:
Well, $6trillion or $7billion, parent project or not, one would think they could ante up either money or manpower to set things right around here. Tuning the magnets is critical, not like it\'s to do with choosing the color of the paint on the bathroom walls. If they\'re going to leave it up to volunteers to provide the CPU time then the very least they can do and it would take much from their budget is to make sure the app runs well enough to not require IR > minQ or else adopt one of the newer strategies for getting the work back fast. If they can\'t make that minimal contribution then they should just be made to wait for whatever they get out of IR = minQ. And if they don\'t like that then let them buy the CPU time they pee down the drain. I have a hunch things would get fixed real quick if they had to do that.

I might not know how science funding works but I do know a welfare bum when I see one and it\'s obvious to me that the LHC is a welfare bum that bites the many generous BOINC hands that feed it. Worse than that, they \"steal\" CPU time away from other projects that are fighting just as hard for funding and are just as broke.

I detached my hosts from LHC@home some time ago when it became obvious there is little motivation to treat crunchers and the BOINC community with the respect they deserve. The ONLY reason I am here is to lobby and if necessary shame the powers that be into ending the needless waste going on here. I do that for the sake of all the other BOINC projects that need the spare CPU cycles LHC@home callously pees down the drain. Don\'t expect me to go away though I will turn the volume down a notch or 3 now that someone is looking into doing something about the problem.

13) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20748)
Posted 7 Nov 2008 by (banished: ID 70524)
Post:

Nice try, Tomas, but you are wrong :)

I do run other projects. None of my computers are attached to LHC because more than 30% of the work done for LHC is wasted effort. It is wasted because they have set IR > minQ. Using the IR > minQ to get results verified sooner was justifiable years ago when BOINC server did not have as many features as it does now. Modern versions of BOINC server has features that permit efficent strategies for getting results verified quickly. Unfortunately, LHC steadfastly refuses to implement those strategies and use the CPU time donated to them efficiently, the way professionally run projects attempt to do.

The reason I complain about LHC\'s wasteful practices is because they steal CPU time away from other worthy projects.



Do you mean like what is suggested by the WCG in this document?
http://boinc.berkeley.edu/trac/attachment/wiki/WorkShop08/ServerManagement-BOINC2008.pdf?format=raw

Same doc as a powerpoint:
http://boinc.berkeley.edu/trac/attachment/wiki/WorkShop08/ServerManagement-BOINC2008.ppt?format=raw




That document mentions a number of new BOINC server features/options and offers several excellent suggestions. I read it several weeks ago but not this morning. Since the document\'s contents are not fresh in my mind, I won\'t say each and every suggestion and option in the document should be implemented here at LHC@home but I will say some combination of those new options and suggestions should be tested, evaluated, tweaked and implemented soon, very soon, so that the long outdated and wasteful IR > minQ strategy can be abandoned. That strategy hurts not only this project but other projects as well.


14) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20745)
Posted 7 Nov 2008 by (banished: ID 70524)
Post:
Why should I work for them for free when their parent project has a $6 trillion budget and more money on the way? If they want me they can pay me a respectable wage.



So that\\\'s the reason why you are complain time after time after time after time on the same issue instead of just choosing to run an other project. You are one of those consults who is looking for a job. :)


Nice try, Tomas, but you are wrong :)

I do run other projects. None of my computers are attached to LHC because more than 30% of the work done for LHC is wasted effort. It is wasted because they have set IR > minQ. Using the IR > minQ to get results verified sooner was justifiable years ago when BOINC server did not have as many features as it does now. Modern versions of BOINC server has features that permit efficent strategies for getting results verified quickly. Unfortunately, LHC steadfastly refuses to implement those strategies and use the CPU time donated to them efficiently, the way professionally run projects attempt to do.

The reason I complain about LHC\'s wasteful practices is because they steal CPU time away from other worthy projects.

15) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20743)
Posted 6 Nov 2008 by (banished: ID 70524)
Post:
I think they don\'t have much time to run the project.


You are correct, the existing staff don\'t have much time. However, LHC@home\'s parent project has a $6 trillion budget and they will be receiving more money to design/build upgrades to the collider. The proper professional solution to their problems is for them to hire more staff and get the job done properly instead of expecting us to pay for the needless waste they produce. The solution they have been using for the past 2 years or more is the welfare bum\'s solution... do next to nothing and let everybody else take care of you and cover for you.

Maybe you could volunteer your services to help setup the system for free?


Why should I work for them for free when their parent project has a $6 trillion budget and more money on the way? If they want me they can pay me a respectable wage.

16) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20741)
Posted 5 Nov 2008 by (banished: ID 70524)
Post:
Stephen,

It always amazes me when I see people whining about not getting enough work from this project when more than 30% of the CPU cycles donated to this project are spent on needlessly duplicated work, a deplorable situation you can blame on this project\\\'s IR > minQ strategy for getting results verified quickly. Not one word from you on that deplorable waste of your money, just a whine for more abuse.

It\\\'s not surprising nothing important ever changes at this project. The majority of crunchers here are, like you, are either totally oblivious to the way this project abuses them or else they just don\\\'t care or else they love being abused. With such a blind, uncaring band of crunchers they can get away with just about anything they want.

If thousands of you sheep would just detach your hosts, they would be forced to fix their crap project. As long as you keep begging for more abuse they\\\'ll just rest on their long worn out laurels and do absolutely nothing about it.

They could spread the work around a little thinner and to more hosts simply by turning on a server option that limits each host to having no more than 2 or 3 tasks per core in the cache at any time. That alone would speed up the return and validation process. They could use homogenous redundancy which would reduce mismatches and speed up the validation process. Then they could reduce the IR to minQ which would shrink their database and reduce the load on the server and bandwidth. But why should they even think about lifting their little finger to do anything when you ignore all that crap?

In a round about way, Stephen, your apathetic attitude and/or ignorance and/or love for abuse has bought you exactly what you deserve... frustration and abuse. The question is, are you going to beg for more abuse or are you going to do the only thing that will force them to stop? Hmmmm?


17) Message boards : Number crunching : Why BOINC 6.X has issues with LHC@home and other things (Message 20725)
Posted 30 Oct 2008 by (banished: ID 70524)
Post:
Thanks for sticking with us folks especially Dagorath who I have only contemplated killing on one or two occasions ;-)


Are you saying my truth telling and dragging issues out from under the carpet they\'ve been swept under has finally embarassed the muppet masters into doing something about the huge, deplorable and needless waste caused by the IR > MinQ anachronism used at this project?

Or is this just another shell game wherein sleight of hand and obfuscation portrays the recent screensaver related compute errors as the villain and allows the IR> MinQ devil to live?

18) Message boards : Number crunching : Getting Work Units? (Message 20712)
Posted 30 Oct 2008 by (banished: ID 70524)
Post:
According to the server status, there is about 17000 new work units being processed. I was not able to get 1 work unit. Is there any trick to get work units when they are available?


Of course there are tricks. To understand the tricks and use them you first have to understand the problem. The problem has 2 parts:

1) the work units are usually gone before your host contacts the LHC server again.

2) your computer might have all the work it can handle from other projects when work is available from LHC so it won\'t download from LHC even though LHC has work available when your computer contacts the server

One trick is to sit at your computer and click the update button every few minutes. That forces your computer to contact the LHC server. Eventually you will click it at just the right moment when your computer wants more work and LHC has some available. Problem is your finger will get sore and your boss will wonder why you\'re not at work. Impractical and probably problematic too.

A better trick is to use the boinccmd tool to automate the clicking of the update button for you. The --project URL operation command detailed in the Control Operations section is the way. The command would be:

boinccmd --project http://lhcathome.cern.ch/lhcathome/ update

Put that line in a script and run it as a cron job if using Linux or do the Windows equivalent (batch (.bat) file run periodically by Windows Event Scheduler or whatever it\'s called).

Mind you, that doesn\'t guarantee your computer will actually be wanting work when it contacts the LHC server, as mentioned in 2) above. To be absolutely sure it wants work you also need to manipulate your cache and/or \"connect every\" settings. That can be automated too with a little batch/script magic. It works very well, I know because I\'ve used it myself. I haven\'t been using it lately because this project is the most screwd up project in the BOINC world and they waste about 33% of the CPU time donated to them. Never mind their lame \"we don\'t have any money\" excuses... this project\'s parent project has a $6 trillion budget with more funding on the way for collider upgrades. No money? Bullshit!

Not trolling here, just explaining the facts and keeping the karma flowing to where it should be :)

19) Message boards : Number crunching : Please note: this project rarely has work (Message 20704)
Posted 28 Oct 2008 by (banished: ID 70524)
Post:
changed my mind
20) Message boards : Number crunching : Actual LHC data to crunch? (Message 20677)
Posted 19 Oct 2008 by (banished: ID 70524)
Post:
I don\\\'t think it is a technical problem but a mentality problem.


Wrong. It\\\'s a technical problem. We simply don\\\'t have the bandwidth. They have already explained that. When is it going to sink in?


My ADSL has 7 Mbit/s, but big cities in Italy reach 20 Mbit/s.


Apparently the IT people at CERN think that\\\'s not enough. Do you know something they don\\\'t know?

Fastweb is offering fiber to the home in selected cities. Once fiber is installed there is no limit to bandwidth.
Tullio


Wonderful! When enough BOINCer get fibre to their homes and a GPU in their box CERN might change their policy. The reality of today\\\'s world is that CERN thinks our resources are inadquate and they are probably right. So why not crunch some other worthy projects along with the LHC tasks you get and be happy about it.

21) Message boards : Number crunching : Actual LHC data to crunch? (Message 20675)
Posted 19 Oct 2008 by (banished: ID 70524)
Post:
I don\\\'t think it is a technical problem but a mentality problem.


Wrong. It\'s a technical problem. We simply don\'t have the bandwidth. They have already explained that. When is it going to sink in?

22) Message boards : LHC@home Science : Nvidia CUDA support for Boinc in future? (Message 20635)
Posted 6 Oct 2008 by (banished: ID 70524)
Post:
Is there any possibility for the Boinc/LHC@home community to write CUDA for number crunching support for LHC and/or other projects? It just seems like a treasure trove of threaded processing power waiting under the hood of some gaming boxes.

-Stu


Check the action on PS3GRID. There you can download a test version of BOINC with CUDA support and get work units that run on nVidia GPU.

The reasons why this project\'s current app will likely never run on GPU has been discussed in at least 1 other thread already. Look around, you\'ll find it.

23) Message boards : LHC@home Science : no work units (Message 20602)
Posted 3 Oct 2008 by (banished: ID 70524)
Post:
anyone getting the problem of no work downloaded or is there somthing wrong with my connection lol


This project rarely has work. When it does have work the chances of getting some are slim.

There are plenty of other very worthwhile projects you can contribute your spare CPU cycles to while you wait for this project to send work. It appears you are new to BOINC so may I suggest attaching to an easy project, in addition to this one, that has a steady supply of work and very few decisions for you to make. Run that project for a few weeks until you see how it all works then maybe try a few other projects.

Some easy projects with steady work that are on the list in the Attach Project Wizard:

  • ABC@home
  • Einstein@home
  • Spinhenge@home
  • Quantum Monte Carlo at Home
  • Malariacontrol.net
  • Rosetta@home


Other projects that do not have steady work


  • SIMAP
  • LHC@home
  • Proteins@Home



P.S. For any project that rarely has work, you should set the Resource Share to a small percentage or else you'll eventually have problems getting work from any project due to a problem with the work scheduler.

24) Message boards : LHC@home Science : LHC @ Home on Playstation 3 (Message 20598)
Posted 2 Oct 2008 by (banished: ID 70524)
Post:
Another problem with the graphics cards/ps3 approach is that the cards are extremely fast single precision machines. Sixtrack however requires double precision. So the speed up is not as big.


The older graphics cards are single precision. I have heard the latest models from Nvidia are double precision but I have not verified that.

Another problem is with rounding errors. Sixtrack is very succeptible to incorrect rounding of floating point numbers. It took ages to work out the differences in the intel and amd architectures. It would be a lot of work to be certain that the GPU chips calculate correctly.


Maybe there is no need to do that work? The new server code LHC@home is using allows for homogenous redundancy which means it is possible to send all result replications for a given work unit to hosts that meet platform criteria. You can, for example, send a given work unit to just Linux hosts and/or hosts with AMD but not Intel. I\'m not sure if it\'s possible to select only hosts with a particular graphic card. If that\'s not possible with current code then it probably wouldn\'t take much effort to make it so.

Thirdly SixTrack as far as i remember relies on a lot of complex floating point math (sin/cos/exp/log) another area where the speed up of the graphics cards are not as impressive as when you can do simple vector operations.


I have heard the latest Nvidia cards do transcendentals too. Again, I have not confirmed it but the source is usually pretty reliable.


Cheers,
Chrulle
Ex-LHCatHome developer


Ex-developer, huh? Well, I don\'t know much weight you carry now that you\'re \"Ex\" but if you could possibly light a few fires under a few frozen-in-place butts (we don\'t know their names but it\'s obvious they exist) and get someone to rub a little money on this project, it would do this project and the crunchers who make it happen a world of good. We just don\'t buy the \"there\'s no money\" excuse anymore, not when this project\'s parent project has a $6 billion construction budget and more $ earmarked for upgrades. Think about it... a +$6 billion project getting it\'s magnet operating parameters computed on a 2 penny DC project.

25) Message boards : Number crunching : Segmentation violation (Message 20594)
Posted 2 Oct 2008 by (banished: ID 70524)
Post:
I restarted lhc@home and shall report. I am still thinking that the problem arises when the server deletes a redundant result.
Tullio


Could be. If it is then it would likely happen less than a second after the client contacts the server. Can you correlate the times of result cancels with the times of the SEGVs? Too bad the messages in your first post don\'t have times for the lines below the \"Scheduler request\". We can\'t say from that how much time elapsed between the request and the SEGV. It might have been less than a second, it might have been several minutes, we can\'t be sure. Can you think of any other way to correlate the times?

Here\'s another idea....

I had a SEGV in a Linux app (not BOINC) today. The app died or got killed or whatever and then came a popup showing the error message. The popup had the title K Stack Trace or something similar. The error was in 1 tab, the other tab was Stack Trace or something similar. So I clicked the stack trace tab, it thought for a few secs then said like \"cannot trace because gdb not present\".

Gdb, I think, is the gcc compiler debugger. I think if gdb had been installed on that machine, I might have got some solid clues as to where the error occured. The app also needs to be compiled with \"debug info\" or symbol tables.

If you go the index of all boinc versions and find a version for Linux that ends with \"_debug.sh\", it will have the symbol tables. Install it, run it and attach to LHC. Install gdb and the K Stack Tracer (or whatever it\'s actually called) too. You might get some very helpful clues to put in a Trac bug report.

Sorry, I can\'t tell you exactly how gdb and stack traces work on Linux, I\'ve never done it before. I\'ll install gdb and a debug version of BOINC here too. If we compare notes and get some hints/tips from people who know how, we can get to the bottom of it, I\'m sure.
26) Message boards : Number crunching : Segmentation violation (Message 20591)
Posted 1 Oct 2008 by (banished: ID 70524)
Post:

No, it is not sending that message, only this:
Scheduler request succeeded, got 0 new tasks


Hmmm. Well, I attached a host running Fedora and BOINC 5.10.45 and turned on <work_fetch_debug> in cc_config.xml. Right now it is saying only
Wed 01 Oct 2008 02:03:15 PM MDT|orbit@home|[work_fetch_debug] work fetch: project not contactable; skipping
Wed 01 Oct 2008 02:03:15 PM MDT|lhcathome|[work_fetch_debug] work fetch: project not contactable; skipping

Maybe some clues will turn up.

27) Message boards : Number crunching : Segmentation violation (Message 20587)
Posted 1 Oct 2008 by (banished: ID 70524)
Post:
It seems this happens also on other projects when the client asks for works and receives none. See this on the BOINC message boards:
BOINC client exits
Tullio


From the trials Jean-David reports in that thread it really looks like Hydrogen@home was causing the SEGV. It seems he got the SEGV only (mostly?) when he requested work from Hydrogen but didn\'t get any.

Now, one thing that is peculiar about Hydrogen is that when that server has no work, the client log shows...
 
[Hydrogen@Home] Message from server: No work sent


...in addition to the standard
24-Aug-2008 06:52:57 Scheduler request succeeded: got 0 new tasks


Is the \"Message from server: no work sent\" logged only when one has debug/logging options set in cc_config.xml?

I don\'t get that message from any of the projects I am attached to when they have no work. I am wondering if that message somehow triggers the SEGV and does LHC send that message too?

28) Message boards : LHC@home Science : article about what we\'re doing? (Message 20571)
Posted 1 Oct 2008 by (banished: ID 70524)
Post:
Nope. It\'s not about LHC@home. It\'s mainly about the Grid which connects the computers that will crunch the data the collider collects. We are crunching data to tune the magnets that keep the beams in the center of the collider\'s tube. The article doesn\'t mention that so it\'s not at all about us.

29) Message boards : Number crunching : Segmentation violation (Message 20556)
Posted 27 Sep 2008 by (banished: ID 70524)
Post:
The BOINC boinc is not static executable


[dagorath@Henry64 BOINC]$ file boinc
boinc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped


I could have sworn they are dynamically linked. In fact I even wrote and article in the wiki stating they are statically linked and had that verified by someone who should know.

Well, I am sure they were static at one time. Guess they changed that policy.

I\'ve searched the OpenSuse site for a BOINC installer but could not find it. Cheers.
Tullio


If there is a BOINC installer for SuSe (it might never have made it out of beta test for all I know), it may be in some other repository rather than the OpenSuSe site. SuSe\'s software installer/updater (is it apt?) may be able to locate it for you though you may have to point apt to some non-official repository.

30) Message boards : Number crunching : Segmentation violation (Message 20554)
Posted 27 Sep 2008 by (banished: ID 70524)
Post:
I installed from BOINC but I am running also SETI, Einstein, QMC, CPDN, CPDN Beta and I never had any problem. On SETI I am running also an optimized application and Astropulse and never had a compute error. Only LHC@home gives me some problems.
Tullio


About a year ago I was getting SIGSEGV on ABC@home tasks when I was running Berkeley\'s version of the client on SuSe. After receiving the same advice I\'m giving you, I decided to try a client compiled on a SuSe system from the guys who were building and testing the BOINC installer for SuSe. That fixed my problem. I no longer run SuSe but I think the BOINC installer for SuSe has been out of the testing phase for some time.

You having no errors from your other projects proves only 1 thing... you get no errors from them. It proves nothing about this project\'s SixTrack app. Their apps may not use the libraries the way SixTrack does. The fact that you run the optimized SETI app and Astropulse is irrelevant.



Previous 20 · Next 20


©2024 CERN