1) Message boards : Number crunching : Task resends are not working properly (Message 25458)
Posted 17 Feb 2013 by S. Dagorath
Post:
Some months ago it was decided to configure the server to issue task resends to fast, reliable hosts and decrease the deadline on resends. That does not seem to be working as it should. It seems to work in some cases but not others. Below I give you a case where the deadline was shortened followed by a case where the deadline was not shortened.

worked properly

The evidence is work unit 6486652 which began with the usual 2 replications on Feb. 4: 14113520 and 14113521. From the difference between the sent and expired datetime stamps we see the two tasks were issued with the standard 8 day deadline.

One of the resends was 14298947 which also timed out and from the difference between its two datetime stamps we see the deadline was ~4 days 8 hrs. This shows the deadline was reduced from the standard 8 days as it should be.

did not work properly

The evidence is work unit 6568028 which began with the standard 2 replications, one of which erred. The resend is task 14476672 and from it's issued datetime stamp of 15 Feb 2013, 21:27:20 UTC and deadline datetime stamp of 23 Feb 2013, 12:59:34 UTC we see it's deadline is 8 days which is not proper since it is a resend.

From the above two cases it seems the deadline is reduced for resends due to a "timed out-no response" result but not for resends due to a "compute error" result. Is that intended behavior, a bug or a misconfiguration?


2) Message boards : News : Forum restrictions (Message 25457)
Posted 17 Feb 2013 by S. Dagorath
Post:
I believe the RED X was disabled because it was abused by tons of people clicking it for posts that were not offensive and were simply posts that expressed an opinion they did not like. IMHO that was the wrong reaction. Rather than disable the Red X they should suspend the abusers' posting privileges for a couple weeks or banish them.

The bottom line is that the Red X requires manual intervention by an admin and they have been traditionally scarce at this project. Months pass before they bother to check in and years pass before they lift a finger to do anything.
3) Message boards : News : Production 2013 (Message 25456)
Posted 17 Feb 2013 by S. Dagorath
Post:
Does this project really need the high horsepower computing capability of a GPU?

Just asking.

And what is the ROI of taking the time to develop and test GPU capability?


Good questions and I believe the answer is clear from this project's history. We know that the only time it has work is when they need to recalibrate/refocus the magnets/beams for higher energies and/or different particles. It appears we (the crunchers) have never failed to get the job done in time using good ol' fashioned CPUs so yah... what's the sense in porting to GPU? I just don't see the need myself, don't see a positive ROI.

Let's not forget that recent updates to include use of sse2/3 and pni extended instruction sets are a significant optimization in and of themselves. Also, reduction of the min quorum to 2 has resulted in a huge speed up too compared to what was in place 5 years ago so again I see no pressing need for GPUs at this project.

Good old FORTRAN. It'll never die and for good reason... it's popular amongst people writing code for crunching numbers. I see no reason to stray from FORTRAN either. They say it compiles into code that is just as fast as C so why bother.

CUDA? Well, AMD users don't like to admit it but CUDA is the more mature platform and the more capable platform when it comes to crunching numbers. That's what the critics and reviewers say and I believe them. OpenCL will catch up one day, perhaps, I hope it does because I own an AMD GPU too.
4) Message boards : Number crunching : 13-14 Feb 2012 - WU restart - but some short-lived (Message 25426)
Posted 14 Feb 2013 by S. Dagorath
Post:
Greetings!!

I noticed that WU have restarted - Thanks!

I started to worry when the first
4 out of 5 had short run-times. (the 5th is still running.)

I am not sure how to interpret sterr out. Is this an error? Or just close-of-file?
All 4 say the same thing for stderr output (time stamps different..):
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
20:00:52 (4370): called boinc_finish
</stderr_txt>
]]


Boinc_finish means the task has ended.

Here is other data:
userid=81571
hostid=10169683 - AMD 64
wu,             task,              run time
wuid=6566397   resultid=14306839     _20.20
wuid=6566039   resultid=14306118     121.79
wuid=6565907   resultid=14305855     __8.17
wuid=6565400   resultid=14304840     _10.84

each of these 4 were validated with another user - with a short run-time.


If they validate then they either the tasks completed properly and the results are good else another computer made exactly the same errors and produced a "bad" result that matches your "bad" result. That is very, very unlikely.

Is it just chance that all 4 were, naturally, short-lived?
Or was there an error - somewhere?


Yes, just chance. Most Sixtrack tasks run longer, some are very short like these. Yes, it seems unlikely you would receive 4 short ones so close together but it happens.
5) Message boards : Number crunching : Errors while computing on CONDOR Cluster (Message 25425)
Posted 14 Feb 2013 by S. Dagorath
Post:
The oldest CPU is an "Intel Core 2 6600" all others are "Intel Core i5-2400S", I guess the Core 2 6600 has the smaller Instruction set


Ack!!! And I swallowed it, hook, line and sinker. Sorry, I should have known better.

Glad it works but I am really sorry to hear you've attached it to SETI, a project whose chances of success are so close to zero we may as well just call it 0, when there are so many projects whose chances are so much higher and who will give us something useful if they succeed. The most SETI can ever do is confirm what we already intuitively "know"... that there are other or have been other intelligent life forms. This is a fact, not an opinion, SETI's methodology is flawed to the point of being assinine. If there was nothing else to spend unused CPU cycles on then do SETI but that is not the case.

Thanks for the info on CONDOR clusters.
6) Message boards : Number crunching : Errors while computing on CONDOR Cluster (Message 25421)
Posted 13 Feb 2013 by S. Dagorath
Post:
I'm not a CONDOR expert. Never used one, never even seen one. I saw a giant condor bird in zoo once and that's the closest I've been to a CONDOR ;-)

In spite of that limitation on my part I think we can arrive at a few answers strictly from first principles without being experts, for example the basic need for each BOINC instance to run in its own dir.

Now this thing with instruction sets... hmmmm.... Christoph might be right, i'm not sure. I only want to correct one small misunderstanding and that is about the tasks. I think probably the tasks are all the same in other words it doesn't matter which instruction set the CPU has it will get the same tasks. The difference is in the various applications. Those are listed here.

At that point my understanding is deficient so now I am asking more than I am suggesting. I don't think we can use an app_info.xml to tell the server which app version to send. I might be completely mistaken and someone please correct me if I am wrong but I think what Athmos must do is manually download the standard application (the one that is not optimised for any instruction set), place it in the proper directory where BOINC will find it (probably .../BOINC/projects/lhcathomeclassic.cern.ch_sixtrack) then use the app_info.xml to tell BOINC to use that app instead of the app the project server sends. Otherwise I think BOINC client reports the CPU instruction set extensions to the project server and the server then uses that info to determine which app to send.
7) Message boards : Number crunching : Who hoards the wu's? (Message 25420)
Posted 13 Feb 2013 by S. Dagorath
Post:
It's is truly unfortunate that someone with as much to contribute as you have (I read your posting on the CONDOR Cluster) at the same time can stoop to the level you have in this thread.


But you stoop to the same level. I quote from your previous message:

Tom95134 wrote:
...why don't you take it to the Cafe and then you can toss barbs back and forth to your heart's content without burdening the majority of us with having to sort through your meanderings in the faint possibility that you have something to contribute to the Project.


If those words aren't denigrating then what are they? Why is that you can stoop to that level but I cannot? I didn't start the denigrating, you started it. What makes you so much better than me? Why can you not take what you dish out?

Yes, I may not be the top of the stack in understanding of subjects being discussed but isn't it the purpose of these boards to provide resources who can help educate those that are trying to attain a better grasp of what is being done?


Of course that's one of the purposes and a noble purpose it is! But when you choose to cut me up like you did rather than ask what the heck I'm talking about or somehow indicate you wish to be educated and get a better grasp then you can expect me to cut you back. If that's what you want then bring it...

Your approach of denigrating anybody who doesn't measure up to your own high opinion of yourself just turns off people to participating in any project.


I warned you about getting all pissy but unfortunately for you, you chose to ignore it. Now your pissyness is causing you to make stuff up. Show us some numbers. Show us hundreds of posts from people who were participating who then stated they were quitting due to me and whose stats show they actually quit and stayed away for more than a day. You don't have that. You don't have it from this project or any other project and you will never have it. So stop with your BS and spin-doctoring because IMHO it turns people off even faster than my truth telling. Why? Because if you post that crap here then you obviously believe we are stupid enough to believe it and that insults each and every one of us. The forum rules prohibit insults.

Remember Dirty Harry... "a man's got to know his limits". That's probably not exactly the way Clint delivered that line but close enough. You don't know your limits. You think you can walk in here and rap off a few trite little cliches you copy 'n paste out of your file full of snappy comebacks you've saved and get somewhere but if you're gonna verbally joust with me then what you need to do is ponder and think, RTFM, do your homework and yes set a high standard for yourself and be not afraid to know when you have achieved it and even harder... be not afraid to know and admit when you have not. Else get your nose rubbed in your own crap again.
8) Message boards : Number crunching : Who hoards the wu's? (Message 25405)
Posted 13 Feb 2013 by S. Dagorath
Post:
As a matter of fact I am attempting to contribute something to the project ...


S. Dagorath
...
Posts: 11
Credit: 0
RAC: 0


I see...


No you don't. Not that it matters. The people who have influence here see and that's what matters. You have no influence here so it's not important that you see. At least not important to me.
9) Message boards : Number crunching : Who hoards the wu's? (Message 25402)
Posted 12 Feb 2013 by S. Dagorath
Post:
Considering the way that this "discussion" (Thread) has been going, why don't you take it to the Cafe and then you can toss barbs back and forth to your heart's content without burdening the majority of us with having to sort through your meanderings in the faint possibility that you have something to contribute to the Project.



You mean like the way we have to sort through your meanderings looking for something that makes sense or somehow contributes? Yah, you posted an insult and now you got one back.

As a matter of fact I am attempting to contribute something to the project and the fact that you're too thick to see it doesn't take away from that. In adition to telling everybody what happens here with hoarded tasks I'm also trying to tell Eric about how to clean up the tail. Now you may not like the way I am doing that but other approaches me and others have tried over the 5 years that this has been going on have been completely ignored and I mean not just given the thumbs down I mean treated like they never heard us. I've known for a long time, from what you post here as well as what you post at T4T, that you're a little slow to catch on to those sorts of technical things so now I'm telling you point blank what's going on and why. Receive that gift in the spirit in which it is given or get all pissy about it, whatever creams yer twinky.
10) Message boards : Number crunching : Who hoards the wu's? (Message 25401)
Posted 12 Feb 2013 by S. Dagorath
Post:
Dagorath - Must be lonely for you being the only smart person in the world.


Sarcastic accusations with nothing to back them up are just drivel. Anyone who follows my posts knows I make tons of mistakes and am the first to admit when I'm wrong. Perhaps you're just jealous because you have nothing to say that could hint at a brain cell sparking away betwixt your ears? Or you just don't have the cajones to say anything? Whatever.

Getting all huffy about WU's on this is such a first world problem.


Huffy? Define that word please. I think it means angry or upset but I'm not sure. If that's what it means then trust me I'm not angry or upset. Why would I be? What have I got to be angry or upset about? The fact that I get more than my share of the tasks? That makes me happy. The fact that Eric is bothered by the tail? I'm not bothered by the tail. It's a problem that irks him and causes him grief not me. Huffy because Fragrant Stool is coming out and admitting he's a troll? Hey, I'm happy for him/her, not huffy. But again maybe I just don't understand what you mean by huffy. Do explain.
11) Message boards : Number crunching : Who hoards the wu's? (Message 25390)
Posted 11 Feb 2013 by S. Dagorath
Post:
The project is voluntary as is the decision to be a troll.


I'm proud of you for admitting your trolling is something you decided to do. It shows you're ready to take responsibility for your actions and that you're ready to finally deal with it. Also, the project is voluntary as you said but you needn't opt out unless of course you feel the pressure of participating is what's driven you to become a troll. If that's the case then perhaps a brief holiday would help your recovery. Just highlight the project in BOINC manager and click No New Tasks or Remove. Then go over to the Trolls Anonymous website and register. There you'll meet other trolls. Eventually you'll be able to step up to the mic and say "Hi. My name is <insert your name here> and I am a troll." Good luck. Remember you are loved and we're all very proud of you.

12) Message boards : News : More work (Message 25382)
Posted 9 Feb 2013 by S. Dagorath
Post:
We are also planning a new test server with better CERN user BOINC communication. Some new CERN users should be starting.


There is talk of a "new era" at T4T. Is that what you're talking about too? Is Sixtrack gonna run alongside Pythia and the other CERN-T4T apps? Under Copilot? That would be nothing less than splendiferous awesomeness!!! Whatever it is I hope you do something about the tail problem. Damn! It's been going on for more than 5 years and next to nothing has been done.

If you're not going to run alongside Pythia et al then you have to bite the bullet and configure the server to insert resends into the front of the queue and send them to fast, reliable hosts. You gotta bite the bullet and just do it even if it means upgrading the Sixtrack server.
13) Message boards : Number crunching : Who hoards the wu's? (Message 25364)
Posted 7 Feb 2013 by S. Dagorath
Post:
BTW, this method was recently discussed at a popular BOINC project dealing with asteroids. It was implemented there and is proven to work. That's all the clues you get. And you don't have to fire Igor, just make him do it.
14) Message boards : Number crunching : Errors while computing on CONDOR Cluster (Message 25363)
Posted 7 Feb 2013 by S. Dagorath
Post:
I think each BOINC instance needs to have it's own data directory. Each instance maintains a state file named client_state.xml so if you have more than 1 instance pointed at the same data directory then they all try to read/write the same state file which of course does not work. You could try something like:

# start BOINC instance #1
./boinc --dir /boinc/instance_1/ --gui_rpc_port <port> --no_gui_rpc --allow_multiple_clients --exit_after_app_start 864000

# start BOINC instance #2
./boinc --dir /boinc/instance_2/ --gui_rpc_port <port> --no_gui_rpc --allow_multiple_clients --exit_after_app_start 864000

# start BOINC instance #3
./boinc --dir /boinc/instance_3/ --gui_rpc_port <port> --no_gui_rpc --allow_multiple_clients --exit_after_app_start 864000


If you know bash script then you would want to use a loop to increment the instance number and port number (more on port number below).

3 other things to consider:

1) The --exit_after_app_start parm might not be necessary. Each task received by BOINC client is tagged with a maximum duration. If the task reaches that maximum BOINC will automatically abort it, abort the task, not kill itself.

2) This project is frequently "dry" which means it frequently has no tasks to send. The dry spells can last days or even weeks. You should consider attaching it to at least one other project that has a steady supply of tasks. Since you would have about 200 instances running I would suggest a very stable, mature, trouble free project such as Numberfields@home to reduce the amount of babysitting. ABC@home is also very stable but they have many dry spells too. The T4T project is this project's sister project. It is sponsored by CERN and assists the work at the LHC. But be careful with T4T because there have been some problems with the VM, more so on Windows than Linux. They have recently solved the worst problems and it is far more stable than it was. Be sure to research it first and try it on a few instances of BOINC before deploying across 200 instances. If it will work for you it will work very well and be very stable. Like I said less trouble with it on Linux than Windows. T4T has a constant supply of work too.

3) The --no_gui_rpc parm.... if you don't allow GUI RPC then you have no way to control the clients. If you use this for security considerations then look at the use of the GUI RPC password in the gui_rpc_auth.cfg file and also implement the remote_hosts.cfg file in which you list the IP addresses of remote hosts that are allowed to connect to BOINC client. Any address not in the list is blocked. You can also put hostnames in remote_hosts.cfg but of course they must resolve somehow to an address, usually by being included in /etc/hosts. Each client instance would need to be instantiated with its own port number as illustrated in the example script above. The least recommended method of monitoring/controlling the client instances would be BOINC manager. A recommended way would be using the boinccmd tool as it is CLI and therefore scriptable. There is also a highly recommended ncurses based app named boinctui which you might find very useful for monitoring but perhaps not so much for controlling (scripted boinccmd for that). Also, there is a Windows GUI app named BOINCtasks that is like BOINC manager but it can monitor and control multiple clients simultaneously. I have never used it but they say it runs very well on wine.

Are you aware of the official BOINC wiki?

15) Message boards : News : More work (Message 25347)
Posted 7 Feb 2013 by S. Dagorath
Post:
I would hate to see work units being generated just for the sake of keeping the queue filled but not having any real value to CERN.


The question is... do these tasks have any real value to anybody? Mostly they are to develop Eric Mcintosh's dream of identical results from OSX, Linux and Windows so he can publish a paper. That was a worthy goal years ago but it isn't anymore. If your project has trouble getting results from the 3 major platforms to verify then the new, modern, sensible way to do it is use a VM and have all platforms run the same app in the VM.

What Eric is doing is like one of us spending thousands of hours learning to hunt and kill elephants with spears... not an easy job, takes a lot of skill if you want to walk away from it alive. Why not just get a rifle, spend a few hours at the shooting range to learn how it shoots and be done with it.
16) Message boards : Number crunching : GPU Advice welcome. (Message 25346)
Posted 7 Feb 2013 by S. Dagorath
Post:
The problem with the heat from a GPU is like any other problem: if you solve it the stupid way then you'll not be happy. If you think about the problem instead of just rushing out and buying the latest trendy gizmo solution then you'll be much happier. The stupid way to solve the heat problem is to let the hot air from the GPU mix into the room air and heat the room up and then try to cool the GPU with that hot air. Think about it, does trying to cool a hot thing by blowing hot air at it make any sense? If you think it does then you're lost already.

Of course you can turn on air conditioning to cool the hot air but that costs money. If you didn't have to spend that money on AC you could buy a better GPU.

So think.... how can you prevent the hot air from the GPU from heating up the room? In the winter maybe it's not a problem. In the summer, however, position the computer below a window, build a duct that prevents the hot air from mixing into the room and causes it to travel up to the window where you have a fan that sucks the hot air in and blows it out the window. You can make a duct out of cardboard and tape or you can buy cheap 4" (100mm) aluminum clothes dryer duct which you can cut with a good pair of scissors. Or you can raise the computer up on a high table and position it directly in front of the fan intake which eliminates the need for a duct. Very simple, very inexpensive, works very, very well.

Another solution is liquid cooling but run the coolant lines through the wall and place the radiator and fan outside. Then the heat doesn't stay in the house.

Anyone who claims there isn't much difference between Linux and Windows is a few bricks shy of a load. One big difference is that Linux works while Windows is garbage. The other major difference is that Linux is free whereas Windows is not and you keep paying and paying and paying and paying.... for more garbage. Use Linux and take the money you save and buy a better GPU. AMD and nVIDIA both have good drivers for Linux, Ubuntu is almost to the point where you can install them from repositories and in fact they may actually be there now.

What you won't find with Linux is GUI apps like GPU-Z but there are CLI apps for monitoring and setting up the GPU, GUI apps are on the way.

Contrary to what Tom says, the real power hogs are the older GPUs, if you consider bang for your buck in other words flops per dollar. The new GPUs use 22nm technology which means the transistors are smaller which in turn means they use less power per operation. On the other hand, because the transistors are smaller they cram more of them into the same size GPU and the thing ends up using more power but it does far more work with the same amount of power. So get a new GPU... an nVIDIA 600 series or AMD 7900 series and avoid the older ones, talk to the guys at GPUgrid and get their advice, they'll tell you that you cannot afford to run an old GPU. Right now the nVIDIA GTX-660Ti is the sweet spot... excellent performance at reasonable price.

Another reason for new GPU... GPUs are evolving very rapidly and older models are becoming obsolete. Already at GPUgrid a number of older nVIDIA cards are no longer usable. nVIDIA 4xx cards are on their way out at GPUgrid and are capable of running only the short version of the tasks, not the long version.

ATI vs. nVIDIA....

nVIDIA uses CUDA and OpenCL, AMD uses OpenCL. OpenCL is a general, non-optimized platform, CUDA is highly optimized for computation intensive applications like you find in BOINC world. The consensus in the general community and not just the BOINC community is that CUDA outperforms OpenCL for sheer computing power. AMD owners don't like to admit that but they are biased so look at unbiased reports from outside the community. Some projects have OpenCL apps for both AMD and nVIDIA, others have CUDA for nVIDIA and OpenCL for AMD. If you buy an nVIDIA based card you'll want to run the projects that use CUDA because in general and over the long run those will drive your card the hardest and use it most efficiently.

I have had an nVIDIA GTX-570 for over a year and recently bought an AMD 7970 which is the latest and best AMD has to offer. It was a mistake. I should have bought an nVIDIA instead. The number of projects offering OpenCL apps is dwindling. Poem has a good AMD app but their tasks are down to a trickle. Milkyway has an AMD app but I would not crunch that project if it was the last project standing, it's a rogue project run by misfits. The prime number projects have AMD apps but the biggest use for prime numbers and rainbow tables is for the banks to secure their transactions and given the fact they they have turned out to be the biggest crooks on the planet (read the latest scandal in the new: mortgage rip-offs, drug money laundering, trucking with terrorists) they can go pound sand, they take enough from my pockets already. WCG has an AMD app for their "help cure cancer" project but that will be finished soon. For me the sensible way is nVIDIA because if a project has a CUDA app then it's right up nVIDIA's alley and if they only have an OpenCL then nVIDIA will work just as good as AMD with that.

I don't plan on buying anymore high end CPUs. My strategy from here on is to buy high end GPUs plus just enough CPU and motherboard to drive the GPU. Yes, that is a consideration... your CPU and motherboard must be fast enough to feed the GPU the data it is to compute and remove the results of the computations and store it to disk or whatever. If your CPU and motherboard aren't fast enough then the GPU will not run optimally. The latest batch of GPUs use PCI-E x16 3.0 which moves data across the PCI bus much faster than PCI-E x16 2.0. If you're going to get a new motherboard and a new generation GPU then you would be wise to get PCI-E x16 3.0. Not every app can use it at this point in time but you can be sure that in the future you will want to have it.

So my next system will be a mobo with 4 PCI-E x16 3.0 slots that run at x16 speed even if all slots are occupied (some mobos drop x16 back to x8 if more than 1 slot is occupied) and a fast 4 core CPU to feed the 4 GPUs I will put onto the mobo. I'm not sure where I'll find a power supply big enough for 4 high end GPUs but I will build one or modify off the shelf PSUs if I need to. The UPS?... any UPS you buy off of a shelf is junk (because the battery sits for months/years in a warehouse with no voltage applied to it so it discharges which causes the sulphur in the acid to deposit onto the plates in the battery and then it's toast) so I have my own custom UPS which works very well. Don't buy a UPS from a store, waste of money.
17) Message boards : Number crunching : Who hoards the wu's? (Message 25345)
Posted 7 Feb 2013 by S. Dagorath
Post:
The tail is annoying to me too but...


...but you never do anything about it so obviously it isn't annoying you very much.

There has been at least one proven method for handling the tail discussed right here in this very forum. But you pretend like it never happened for some reason. WTF??

Now I have an even better solution for the tail but I'm not going to give it to you until you say "Please, pretty please. I promise I will make Igor implement it and if he doesn't I'll run his butt outta here."
18) Message boards : Number crunching : Who hoards the wu's? (Message 25344)
Posted 7 Feb 2013 by S. Dagorath
Post:
Are people able to hoard wu's?


The one who pulls my strings hoarded ~500 on 2 i7 machines. There was a limit of 3 or 4 per core at one time but do realize that's 3 or 4 per core per BOINC installation. See how it's done, son? They say it doesn't work on Win but I never tried it on Win. Works fiendishly well on Lin though.




©2024 CERN