61) Message boards : Number crunching : Each upload generates new computer-id (Message 13831)
Posted 2 Jun 2006 by Gaspode the UnDressed
Post:
See thread 2177
62) Message boards : Number crunching : New host id being created at each connect (Message 13830)
Posted 2 Jun 2006 by Gaspode the UnDressed
Post:
There is a problem with several projects where the system creates a new host id every time the user's computer connects to the project. From my reading of the thread on the Rosetta forum ( http://www.boinc.bakerlab.org/rosetta/forum_thread.php?id=1669 ), a patch can be installed to fix this.

Is it possible to install the patch on the LHC servers?

At least we can merge hosts on LHC in the meantime - Rosetta has the merge facility turned off until the Autumn :-)


There are no Admins at LHC at present. An update to the server side code is unlikely for the time being.

The problem seems to relate to those people using account managers like BAM. Maybe one should stick with the manual way of doing things until the bugs are ironed out and all the projects are updated.
63) Message boards : Number crunching : Work to be done! (Message 13824)
Posted 2 Jun 2006 by Gaspode the UnDressed
Post:

My honest opinion about the level of work (not) available at LHC has more to do with CERN really putting themselves behind the power of free distributed computing which BOINC offers than it does anything else such as reliability, security, or the somewhat unpredictable job completion time. Each of those criteria have solutions except the commitment by CERN to fully exploit this awesome resource they have with us.



It's not actually that simple. Many of the applications that the physicists use have been developed and refined over a long period of time, generally in an environment where a large monolithic computer was the only viable resource for running them.

Fast forward a couple of decades and BOINC arrives. BOINC can deliver enormous processing power, but it can't deliver the large computing environment needed by some of these applications. It's a breeze to run a model that requires 2Gb RAM, 4Gb RAM, 256Gb RAM, or more on a big mainframe. It just can't be done on a PC.

If the applications can be broken down into smaller modules such that data can be broken up and pipelined through different processes then maybe we have a DC application, but doing it will require a rewrite from the bottom up.

New applications don't have the legacy of development to contend with, so Einstein, Predictor, Rosetta, etc. can develop with DC in mind.


64) Message boards : Number crunching : Server problems (Message 13818)
Posted 2 Jun 2006 by Gaspode the UnDressed
Post:
I really wish that they would get some one in to look after the servers. When the system chokes on more than 40 concurrent connections........ Makes it look like they don't care about the boinc side of things any more and don't really need us to crunch the data for them. I don't really think this is true but that is the way it appears.


LHC has always got a bit flustered above about 50 connections. New servers have been acquired, and one of the jobs being done by those people still working on the BOINC project is to integrate them into the CERN computer centre. This is a non-trivial exercise as the installation of Linux and MySQL needs to be automated in a slightly different way from normal. (Once done, the machine can be restored much more quickly in the event of a serious failure. A worthwhile exercise in my view)

Unfortunately, the people working on LHC@Home also have other jobs to do, so this will take longer than we might like.
65) Message boards : Number crunching : why so much of difference ??? (Message 13817)
Posted 2 Jun 2006 by Gaspode the UnDressed
Post:
---So what you are saying (in effect) is that even if the work unit crashes, it's still meaningful information for the project??


If by 'crash' you mean the result terminates with an error then no: no meaningful data is returned. If the unit runs to completion very quickly (not what I would call crash) then yes.

66) Message boards : Number crunching : I think we should restrict work units (Message 13810)
Posted 1 Jun 2006 by Gaspode the UnDressed
Post:

What do you think the probabilty is of a single bit (or any other) error causing the same incorrect answer in even TWO of the three members of the quorum?


Extremely small, I'd guess.

Sixtrack suffers from the single-bit sensitivity because of the way it handles its numbers, and the fact that it does the operations repeatedly. A single bit error in the first iteration of an algorithm will generate a different erroneous result than the same error occuring at, say, iteration 500,000. Given that a single bit problem can creep in potentially anywhere (and anywhen), the chances of two different computers generating the same incorrect result are vanishingly small.

The same can't be said of the same computer running the same unit twice, however. It is possible that some sort of systematic failure could generate consistent errors at consistent points in the algorithm. Such a computer would probably never generate a valid LHC result, although it might work perfectly well in every other regard.



67) Message boards : Number crunching : I think we should restrict work units (Message 13806)
Posted 1 Jun 2006 by Gaspode the UnDressed
Post:
Does anyone think that the reason the initial replication is 5 while the quorum is only 3 is to generate extra work for all the work hungry volunteers?


The five/three ratio is to improve the chances of getting a quorum at the first attempt. It's down to SixTrack's extreme sensitivity to numerical accuracy. In aven the most solid computer there can be the occasional single bit error that will throw the result off. Sending five results should improve the chance of a reaching a quorum, and so reduce the completion time for the study.

From what I see on the results pages, most results reach quorum at three, so a replication of five is redundant. I'd like to know if the fourth and fifth results are still issued if a quorum has already been reached.
68) Message boards : Cafe LHC : Anyone have some good song selections to share? (Message 13797)
Posted 31 May 2006 by Gaspode the UnDressed
Post:
Try this one...

Lion

69) Message boards : Number crunching : I think we should restrict work units (Message 13785)
Posted 29 May 2006 by Gaspode the UnDressed
Post:
My word - we are all sensitive today. No offence intended here, and none taken. However, I will say this:

If the cap fits, wear it!

Happy crunching!

70) Message boards : Number crunching : I think we should restrict work units (Message 13777)
Posted 29 May 2006 by Gaspode the UnDressed
Post:
Seriously though im guessing saying nasty things to others make u feel good so fire away im very thick skinned and i like making others happy.


No offence intended. This was a gentle dig aimed at the 'credit-racers', of whom you claim you're not one. But judging by your sensitivity to it I'd guess that you are 'racing' more than you let on.

71) Message boards : Number crunching : Work Units still pending? (Message 13775)
Posted 28 May 2006 by Gaspode the UnDressed
Post:
It is well known that there are 'issues' with the LHC@Home database. They've been discussed endlessly in these boards. It is also well known that there was a database crash earlier this year that corrupted machine data, and maybe WU data too.

It has been said by the admin team that manually fixing these orphaned units will probably cause more problems than it fixes.

And lastly, it is also well known that there is no administrator in control of this system at present. Chrulle does a bit here and there, but he no longer works for CERN.

So, if you have a few old results kicking around please accept them gracefully - there won't be a solution along any time soon.

72) Message boards : Number crunching : I think we should restrict work units (Message 13770)
Posted 28 May 2006 by Gaspode the UnDressed
Post:
Personally this is how i have started doing LHC. It has an over 50% resource share on boinc. I generally run a 0.1 day cache as im on a DSL connection..
1.Im sitting at my computer and notice LHC has work.
2.I suspend all other project and bump cache to 10 days.
3.Until the work runs out i leave cache at 10 days and only run LHC.
4.LHC has no work.
5.Cache goes back to 0.1 days and i allow a very low proirity project to fetch work.
6.I run out of LHC work and switch all other projects back on.

On average my huge 10 day cache lasts...oh about 2 days so results still come back very quickly and i get maximum work crunched for LHC whenever it has work to do.


Why on earth go through all this rigmarole. Why not set LHC at, say, 80% resource share, and cache at 0.1 day. Enable all projects. Leave well alone.

BOINC will manage the work fetch cycle based on your cache size and you'll get loads of work for LHC when it's there. There won't be any work hoarded, so others can do it. Work will get done quicker overall.

Oh - I see the problem now: you might not get as many of those rare and valuable LHC credits to spend on goodies in the local shopping mall. You'll have more credits from other projects but I guess they aren't worth as much.

What's that? You can't spend the credits? Oh my...

73) Message boards : Number crunching : I think we should restrict work units (Message 13752)
Posted 26 May 2006 by Gaspode the UnDressed
Post:

Keep your cache as small as possible.

Maybe a little less name-calling and a little more focus on the topic might be in order.


74) Message boards : Number crunching : why so much of difference ??? (Message 13745)
Posted 26 May 2006 by Gaspode the UnDressed
Post:


The FAQ at http://lhcathome.cern.ch/FAQ.html#2.2 says that they give out work units of three different lengths (10,000, 100,000, and 1,000,000 turns around the accelerator). Your units seem to complete in three time ranges (hundreds of seconds, thousands of seconds, and tens of thousands of seconds). Maybe you've gotten work units of all three lengths and that's the difference?


There's more. SixTrack is studying the stabilty of the beam. It models 60 particles travelling around the accelerator, calculating their displacement from the nominal centre of the tube. A stable beam will run to completion, but some sets of parameters give an unstable beam, and particles fly off from it as they circulate. A beam like this can complete in a few seconds as all the particles hit the wall of the accelerator. In some ways these unstable beams are more important than the stable ones as they help define the operating limits of the machine.


75) Message boards : Number crunching : why so much of difference ??? (Message 13739)
Posted 25 May 2006 by Gaspode the UnDressed
Post:
why such an amount of difference for the crunching

http://lhcathome.cern.ch/results.php?hostid=165488

Joseph yours.


The host you indicate seems to be behaving pretty normally. What's the problem?
76) Message boards : Number crunching : I think we should restrict work units (Message 13728)
Posted 25 May 2006 by Gaspode the UnDressed
Post:
It is important to MaxCache LHC becuase it takes only 18-30 hours for them to run out of work once they put it up (looks like they put up 80000-150000 WUs at a time).


What's important is getting the work done. As I write this, GreatInca, one of your computers still has eight results left to run, while my computers have been out of work for two or three days. Had you not filled your cache to excess the eight results could have been assigned to other computers and the work completed days ago. Spread this across thousands of users and the tail of unreturned results drags out for days.

Each study generally requires analysis of the previous study to determine appropriate parameters. If each study is completed more quickly the next one can be released sooner. The message, therefore is:

Keep your cache as small as possible.



77) Questions and Answers : Unix/Linux : Not getting new workunits (Message 13694)
Posted 22 May 2006 by Gaspode the UnDressed
Post:
There's no LHC client for the Mac so you won't ever get work. Best try a different project.

78) Message boards : LHC@home Science : disk space (Message 13693)
Posted 22 May 2006 by Gaspode the UnDressed
Post:
Try checking your General Preferences under 'Your Account' from the main menu. Make sure you haven't got 'Leave at least' set to 10Gb, or something huge.

BTW, this is probably better posted in 'number crunching'.
79) Message boards : Number crunching : Screensaver locking up mouse (Message 13690)
Posted 22 May 2006 by Gaspode the UnDressed
Post:
The latest recomended client (5.4.9) should save from this problem. It is supposed to kill the graphics thread if it is not responding, this usually aborts the workunit as well. It is not really a fix but it makes things alot easier to live with.


i beleive that recomended client is still 4.45 on LHC site


There is precious little resource available to maintain the BOINC system at LHC. At present there isn't an administrator (except that Chrulle does a bit sometimes, even though he's left) so this is just an oversight, and should be updated. There are many user, myself included, using version 5 clients successfully. Version 5.4.9 works well on the only box I have installed it on. I haven't yet had time to upgrade my other systems.

80) Questions and Answers : Preferences : Report Deadline sooner than 10-day Network Preference (Message 13686)
Posted 22 May 2006 by Gaspode the UnDressed
Post:
If you miss the deadline for a result the result will be issued to another participant. If you then return youR result you might get credit for it if the result hasn't already been completed by other participants. With a deadline of 7 days and a connection interval of 10 it's likely that the result will have been completed and credit granted long before you return your crunched data.


Different projects use different deadlines. LHC varies the deadlines to optimise the time required to complete a study. Some results are issued with deadlines as short as four days. The variation you ask for effectively gives a deadline of seventeen days, rather than seven. With the batch nature of LHC this would slow down their comuting dramatically. You're not likely to get this sort of change.

If you're not able to meet the deadlines for the project while you travel then it's better not to take the work. Consider a different project with a longer deadline and contribute to LHC again when you return.



Previous 20 · Next 20


©2024 CERN