Message boards :
Number crunching :
LHC uses very little L2 cache
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Jul 05 Posts: 82 Credit: 6,336 RAC: 0 |
I just got a replacement computer for the wifes old Pentium 133, built a celeron 2.93 for her. Have LHC running on it and it is doing the same as (or better) than the P4's and AMD's. Only thing that would cause this is that LHC dues not use much of the L2 cache. Good to know that you don't need an expensive CPU for LHC like some other projects. Rosetta and CPDN also run good on the Celeron, others slow down a lot. new computer Ray Pizza@Home - Rays Place - Rays place Forums |
Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
I just got a replacement computer for the wifes old Pentium 133, built a celeron 2.93 for her. Have LHC running on it and it is doing the same as (or better) than the P4's and AMD's. But only doing 1 wu at a time, unlike the HT P4's. |
Send message Joined: 13 Jul 05 Posts: 82 Credit: 6,336 RAC: 0 |
True, but when I got the processor for only $31.95 after rebates I can't complaine. Most extra cash from the retiremant check is going into working on the house, but probably in a year or so I will get a HT chip for that, or put one onto the P4 (2.4) that I am at now. Pizza@Home - Rays Place - Rays place Forums |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
The amount of L2 Cache (and use of) has some to do with it but not entirely. It depends on the arcitecture of the processor and how the floating point and integer sections are balanced, hard disk speed and cache, front side bus speed, memory bandwidth and amount. You can have the fastest processor but if some other section of your computer lacks you will have a bottleneck and not get the best performance. Its a combination of all componets in your system. Certain projects do more floating point operations than interger operations. Thus "others slow down a lot". |
Send message Joined: 13 Jul 05 Posts: 82 Credit: 6,336 RAC: 0 |
The amount of L2 Cache (and use of) has some to do with it but not entirely. It depends on the arcitecture of the processor and how the floating point and integer sections are balanced, hard disk speed and cache, front side bus speed, memory bandwidth and amount. You can have the fastest processor but if some other section of your computer lacks you will have a bottleneck and not get the best performance. Its a combination of all componets in your system. Very true, a lot of things make a diferance in how it works. I have a WD Cavier SE HD with a 8 meg catah on it, and 1 gig of PC2700 memory. The faster PC3200 memory probably would not help with the Celeron but should have gotten it for upgradeing the CPU in a few years. But I was able to build it this way for under $200 and she is very happy with it now, makes store bought systems with the same processer look slow, most prebuilt Celeron systems have PC2100 memory and slower access HD's. Very little extra for the better HD and Memory, don't know why pre assembeled ones don't have them, except every cent they can save is more profit. The Cavier SE HD was only $2 more than the standard Cavier with a 2 meg catch. Cheers Ray Pizza@Home - Rays Place - Rays place Forums |
Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,691,526 RAC: 4 |
For a lot of projects, the main bottleneck is memory bandwidth from the CPU to the system memory. So CPUs with more L2 cache complete work units faster than the same CPU with less cache. It has been a while since I looked at this but if I'm remembering correctly, I have observed the same thing with LHC although on the other end. I have a 1.3 GHz Penium-M laptop with 1 MB of L2 cache. On seti it beats the socks off my AMD 2400+s but on LHC it lags slightly behind them. - A member of The Knights Who Say NI! My BOINC stats site |
Send message Joined: 26 Sep 05 Posts: 85 Credit: 421,130 RAC: 0 |
The thing with L2 cache (and I'm not sure how they're data is setup wrt this) is that it helps when the active data set (the data the CPU is working on at any given time) is small enough to fit in the L2 cache. When the active data set grows too large, then the CPU would end up thrashing the cache way to often (as data has to be discarded from it, and new data re-loaded) to allow further processing. In cases like that, the limited size of the L2 cache could become a bigger bottle kneck then if one didn't use it as much... This because one is constantly loading stuff into the cache, to then be discarded to make room for what one really needs at the moment. On the other hand, if the working data set is small enough that a say 256 KB L2 cache can hold it all, then one wouldn't necessarily see much performance benefit throwing 512 KB of L2 in there. This would be similar to, the main advantage to adding more RAM is that it reduces paging out to the swap file (hard drives being slow). However, if one has so much RAM already, that everything can fit in RAM, with a ton of room to spare, then adding more RAM might not help much. AKA, a person with 512 MB of RAM might see improved performance upgrading it to 1 GB. However a person with 2 GB of RAM already might not see much advantage to upgrading it to 4 GB, unless someone really is using more then 2 GB... Some CPUs offer larger cache (albeit even 1, 2, or 4 MB of L2 isn't inexaustable depending on the circumstance), though many of these CPUs are typically used for servers (aka Opterons, Xeons, etc). Thing with L2 cache is yes it's faster, but it's also much smaller and can't contain as much data in it at any given time... |
Send message Joined: 29 Sep 04 Posts: 196 Credit: 207,040 RAC: 0 |
To add what Son Goku was saying: The L1/L2 cache on all AthlonXP and Pentium 4 CPUs run at the speed of the cpu itself. Hence the faster the cpu, the faster the cache. Older CPUs which used Slot-A or Slot-1 (Older Athlon/Duron or Pentium II/III CPUs) had an external L2 cache which was integrated on the package, but not *in* the CPU. This also meant on those architectures the L2 was clocked somewhat slower than the CPU itself but it was way faster than standard CPU-to-RAM access. L1 cache AFAIK in all CPUs (including chips as far back as the Pentium, K6 and probably others) is always CPU speed. I have on my "main" machine an AthlonXP 3200+/512K L2 Cache + 128 L1 Cache (640KB Effective, don't get me started on Pentium4s..). LHC seems to make use of it from what I can tell from the LED diagnostic lights on my memory modules. Granted it's more access than with Predictor, I'm sure some of it is working from Lx memories and the CPU is more of a limitation on max processing speed than anything else. Now if we're talking climateprediction.net, my RAM is 100% maxed out and is the major bottleneck of the HADSM application. The RAM I have is high-quality Corsair XMS PC3200 (DDR400), 2x512MB Chips with 2-2-2-5 2T timings, nearly the best you can get with DDR400 (I'd love to get 1T timing, the chips are capable of it.. silly BIOS limitations). If i were able to use overclocked RAM, the greater speed would positively affect crunching times with CPDN .. but that's not an option for me and it would probably have a lesser effect on LHC & Predictor. It would be cool if I had a program which can tell the state of the CPU caches and whatnot .. and if I were able to understand how to use that program & interpret its findings. Alas, I have no such software or knowledge. Last thing, I have a Pentium-M 1.6GHz Laptop with a 2MB L2 .. and I gotta say if I were able to clock it to 2.2GHz (the real speed of my AthlonXP), I'm quite sure it would outperform the Athlon due to the larger L2 size and some minor architectural differences. |
Send message Joined: 27 Jul 04 Posts: 182 Credit: 1,880 RAC: 0 |
Well, i wrote some automatic benchmarking tools for chpstl (http://www.cphstl.dk/). It uses PAPI (http://icl.cs.utk.edu/papi/) which will allow you to access the performance counting registers on the CPU. With this in your program you can see how big a percentage of your memory accesses that causes a miss in the caches or how many pagefaults your program creates. Very useful stuff. Chrulle Research Assistant & Ex-LHC@home developer Niels Bohr Institute |
Send message Joined: 29 Sep 04 Posts: 196 Credit: 207,040 RAC: 0 |
Well, i wrote some automatic benchmarking tools for chpstl (http://www.cphstl.dk/). It uses PAPI (http://icl.cs.utk.edu/papi/) which will allow you to access the performance counting registers on the CPU. With this in your program you can see how big a percentage of your memory accesses that causes a miss in the caches or how many pagefaults your program creates. Very useful stuff. I'm curious what kind of findings you have/will have. ;) Keep us posted! |
©2024 CERN