Message boards : Number crunching : LHC uses very little L2 cache
Message board moderation

To post messages, you must log in.

AuthorMessage
[B@H] Ray

Send message
Joined: 13 Jul 05
Posts: 82
Credit: 6,336
RAC: 0
Message 11215 - Posted: 6 Nov 2005, 2:12:49 UTC

I just got a replacement computer for the wifes old Pentium 133, built a celeron 2.93 for her. Have LHC running on it and it is doing the same as (or better) than the P4's and AMD's.

Only thing that would cause this is that LHC dues not use much of the L2 cache. Good to know that you don't need an expensive CPU for LHC like some other projects. Rosetta and CPDN also run good on the Celeron, others slow down a lot.

new computer

Ray

Pizza@Home - Rays Place - Rays place Forums
ID: 11215 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 11216 - Posted: 6 Nov 2005, 2:31:18 UTC - in response to Message 11215.  

I just got a replacement computer for the wifes old Pentium 133, built a celeron 2.93 for her. Have LHC running on it and it is doing the same as (or better) than the P4's and AMD's.

Only thing that would cause this is that LHC dues not use much of the L2 cache. Good to know that you don't need an expensive CPU for LHC like some other projects. Rosetta and CPDN also run good on the Celeron, others slow down a lot.

new computer

Ray


But only doing 1 wu at a time, unlike the HT P4's.
ID: 11216 · Report as offensive     Reply Quote
[B@H] Ray

Send message
Joined: 13 Jul 05
Posts: 82
Credit: 6,336
RAC: 0
Message 11218 - Posted: 6 Nov 2005, 2:39:33 UTC - in response to Message 11216.  
Last modified: 6 Nov 2005, 2:40:26 UTC


But only doing 1 wu at a time, unlike the HT P4's.


True, but when I got the processor for only $31.95 after rebates I can't complaine. Most extra cash from the retiremant check is going into working on the house, but probably in a year or so I will get a HT chip for that, or put one onto the P4 (2.4) that I am at now.


Pizza@Home - Rays Place - Rays place Forums
ID: 11218 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 11229 - Posted: 6 Nov 2005, 18:06:04 UTC - in response to Message 11215.  

The amount of L2 Cache (and use of) has some to do with it but not entirely. It depends on the arcitecture of the processor and how the floating point and integer sections are balanced, hard disk speed and cache, front side bus speed, memory bandwidth and amount. You can have the fastest processor but if some other section of your computer lacks you will have a bottleneck and not get the best performance. Its a combination of all componets in your system.

Certain projects do more floating point operations than interger operations. Thus "others slow down a lot".
ID: 11229 · Report as offensive     Reply Quote
[B@H] Ray

Send message
Joined: 13 Jul 05
Posts: 82
Credit: 6,336
RAC: 0
Message 11230 - Posted: 6 Nov 2005, 19:14:12 UTC - in response to Message 11229.  

The amount of L2 Cache (and use of) has some to do with it but not entirely. It depends on the arcitecture of the processor and how the floating point and integer sections are balanced, hard disk speed and cache, front side bus speed, memory bandwidth and amount. You can have the fastest processor but if some other section of your computer lacks you will have a bottleneck and not get the best performance. Its a combination of all componets in your system.

Certain projects do more floating point operations than interger operations. Thus "others slow down a lot".


Very true, a lot of things make a diferance in how it works. I have a WD Cavier SE HD with a 8 meg catah on it, and 1 gig of PC2700 memory. The faster PC3200 memory probably would not help with the Celeron but should have gotten it for upgradeing the CPU in a few years.

But I was able to build it this way for under $200 and she is very happy with it now, makes store bought systems with the same processer look slow, most prebuilt Celeron systems have PC2100 memory and slower access HD's. Very little extra for the better HD and Memory, don't know why pre assembeled ones don't have them, except every cent they can save is more profit. The Cavier SE HD was only $2 more than the standard Cavier with a 2 meg catch.

Cheers
Ray

Pizza@Home - Rays Place - Rays place Forums
ID: 11230 · Report as offensive     Reply Quote
Toby

Send message
Joined: 1 Sep 04
Posts: 137
Credit: 1,691,526
RAC: 4
Message 11245 - Posted: 7 Nov 2005, 0:45:11 UTC

For a lot of projects, the main bottleneck is memory bandwidth from the CPU to the system memory. So CPUs with more L2 cache complete work units faster than the same CPU with less cache. It has been a while since I looked at this but if I'm remembering correctly, I have observed the same thing with LHC although on the other end. I have a 1.3 GHz Penium-M laptop with 1 MB of L2 cache. On seti it beats the socks off my AMD 2400+s but on LHC it lags slightly behind them.
- A member of The Knights Who Say NI!
My BOINC stats site
ID: 11245 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 11251 - Posted: 7 Nov 2005, 4:09:04 UTC
Last modified: 7 Nov 2005, 4:21:25 UTC

The thing with L2 cache (and I'm not sure how they're data is setup wrt this) is that it helps when the active data set (the data the CPU is working on at any given time) is small enough to fit in the L2 cache. When the active data set grows too large, then the CPU would end up thrashing the cache way to often (as data has to be discarded from it, and new data re-loaded) to allow further processing. In cases like that, the limited size of the L2 cache could become a bigger bottle kneck then if one didn't use it as much... This because one is constantly loading stuff into the cache, to then be discarded to make room for what one really needs at the moment.

On the other hand, if the working data set is small enough that a say 256 KB L2 cache can hold it all, then one wouldn't necessarily see much performance benefit throwing 512 KB of L2 in there. This would be similar to, the main advantage to adding more RAM is that it reduces paging out to the swap file (hard drives being slow). However, if one has so much RAM already, that everything can fit in RAM, with a ton of room to spare, then adding more RAM might not help much. AKA, a person with 512 MB of RAM might see improved performance upgrading it to 1 GB. However a person with 2 GB of RAM already might not see much advantage to upgrading it to 4 GB, unless someone really is using more then 2 GB...

Some CPUs offer larger cache (albeit even 1, 2, or 4 MB of L2 isn't inexaustable depending on the circumstance), though many of these CPUs are typically used for servers (aka Opterons, Xeons, etc). Thing with L2 cache is yes it's faster, but it's also much smaller and can't contain as much data in it at any given time...
ID: 11251 · Report as offensive     Reply Quote
Travis DJ

Send message
Joined: 29 Sep 04
Posts: 196
Credit: 207,040
RAC: 0
Message 11255 - Posted: 7 Nov 2005, 8:15:44 UTC
Last modified: 7 Nov 2005, 8:19:11 UTC

To add what Son Goku was saying:

The L1/L2 cache on all AthlonXP and Pentium 4 CPUs run at the speed of the cpu itself. Hence the faster the cpu, the faster the cache. Older CPUs which used Slot-A or Slot-1 (Older Athlon/Duron or Pentium II/III CPUs) had an external L2 cache which was integrated on the package, but not *in* the CPU. This also meant on those architectures the L2 was clocked somewhat slower than the CPU itself but it was way faster than standard CPU-to-RAM access. L1 cache AFAIK in all CPUs (including chips as far back as the Pentium, K6 and probably others) is always CPU speed.

I have on my "main" machine an AthlonXP 3200+/512K L2 Cache + 128 L1 Cache (640KB Effective, don't get me started on Pentium4s..). LHC seems to make use of it from what I can tell from the LED diagnostic lights on my memory modules. Granted it's more access than with Predictor, I'm sure some of it is working from Lx memories and the CPU is more of a limitation on max processing speed than anything else. Now if we're talking climateprediction.net, my RAM is 100% maxed out and is the major bottleneck of the HADSM application. The RAM I have is high-quality Corsair XMS PC3200 (DDR400), 2x512MB Chips with 2-2-2-5 2T timings, nearly the best you can get with DDR400 (I'd love to get 1T timing, the chips are capable of it.. silly BIOS limitations). If i were able to use overclocked RAM, the greater speed would positively affect crunching times with CPDN .. but that's not an option for me and it would probably have a lesser effect on LHC & Predictor.

It would be cool if I had a program which can tell the state of the CPU caches and whatnot .. and if I were able to understand how to use that program & interpret its findings. Alas, I have no such software or knowledge.

Last thing, I have a Pentium-M 1.6GHz Laptop with a 2MB L2 .. and I gotta say if I were able to clock it to 2.2GHz (the real speed of my AthlonXP), I'm quite sure it would outperform the Athlon due to the larger L2 size and some minor architectural differences.

ID: 11255 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 11257 - Posted: 7 Nov 2005, 14:37:47 UTC

Well, i wrote some automatic benchmarking tools for chpstl (http://www.cphstl.dk/). It uses PAPI (http://icl.cs.utk.edu/papi/) which will allow you to access the performance counting registers on the CPU. With this in your program you can see how big a percentage of your memory accesses that causes a miss in the caches or how many pagefaults your program creates. Very useful stuff.

Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 11257 · Report as offensive     Reply Quote
Travis DJ

Send message
Joined: 29 Sep 04
Posts: 196
Credit: 207,040
RAC: 0
Message 11265 - Posted: 8 Nov 2005, 1:27:49 UTC - in response to Message 11257.  

Well, i wrote some automatic benchmarking tools for chpstl (http://www.cphstl.dk/). It uses PAPI (http://icl.cs.utk.edu/papi/) which will allow you to access the performance counting registers on the CPU. With this in your program you can see how big a percentage of your memory accesses that causes a miss in the caches or how many pagefaults your program creates. Very useful stuff.


I'm curious what kind of findings you have/will have. ;) Keep us posted!

ID: 11265 · Report as offensive     Reply Quote

Message boards : Number crunching : LHC uses very little L2 cache


©2024 CERN