Message boards :
Number crunching :
How is the SSE3 thing coming along?
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0 |
Speaking of tricky things... I "sometimes" get SSE3 tasks on the i7-950 box but not on my AMD 1100T box. Both run Win-7 64b with 12Gig memory. Interestingly, this is what BOINC 7.0.28 reports on both boxes: i7-950: Win7-950 6 7/4/2012 1:39:55 PM Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5] 7 7/4/2012 1:39:55 PM Processor: 256.00 KB cache 8 7/4/2012 1:39:55 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall lm vmx tm2 popcnt pbe AMD 1100T: Win7-Compaq 7 7/7/2012 1:05:22 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1100T Processor [Family 16 Model 10 Stepping 0] 8 7/7/2012 1:05:22 PM Processor: 512.00 KB cache 9 7/7/2012 1:05:22 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow Interesting that neither processor reports SSE3, however, they are both supposed to support it according to: http://en.wikipedia.org/wiki/SSE3 Does seem to me the SSE3 WU's run twice as fast... but who knows for sure why.. :) |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Interesting and useful feedback. sse2/3 run faster because SixTrack was originaly vectorised for the Cray or pipelined and sse2/3 are designed to be particularly effective on this kind of code on both Intel and AMD. Eric. |
Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0 |
Interesting and useful feedback. Soo, I am guessing there may be a BOINC bug (server and/or client) when deciding to send the proper WU's to a system... or else, it's maybe balancing the tasks so the non-sse3 WU's also get done? In any case, I can't wait until things get tweaked a little better... better for everything! Thanks! :) PS: Here is another box actively running LHC and it doesn't receive SSE3 tasks either.. Win7-AZZA 7 7/7/2012 7:56:06 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0] 8 7/7/2012 7:56:06 PM Processor: 512.00 KB cache 9 7/7/2012 7:56:06 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Even more interesting, I note that the "flags" do not include sse3. Still you should get sse2 at least, whch is good enough............Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
.........and I see ssse3 in another set of flags! (I have seen this om my AMD but I thought is was cygwin. |
Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0 |
.........and I see ssse3 in another set of flags! LOL! The case of the missing BOINC Client flags! LOL! Oh well, I'm sure it will all get sorted out one of these days. :D Win7-R400 7 7/7/2012 1:31:38 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1100T Processor [Family 16 Model 10 Stepping 0] 8 7/7/2012 1:31:38 PM Processor: 512.00 KB cache 9 7/7/2012 1:31:38 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
Even more interesting, I note that the "flags" The flag list does contain 'pni', which is another name for SSE3. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
444.01 is certainly faster on my Linux box, Opteron 1210 with pni in its flags. Tullio |
Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,671,357 RAC: 0 |
The assignation is quite weird, for a given machine that supports both SSE2 and SSE3, I have a mix of SSE3 and SSE2 WUs ... I guess it's not the best assignments for a fast return of work, this machine should only get SSE3 work, isn't it ? |
Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0 |
Even more interesting, I note that the "flags" I completely missed that "pni" flag thing... Soo, none of the AMD boxes here ever get SSE3 tasks, only the Intel 950 box... Hmmmmmm... Me thinks more tweaking to do in Boinc Land... :) |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
444.01 is about 4 times faster than 443.07 on my Opteron 1210 running SuSE Linux 11.1. but it's slower by a factor of ten compared to an Intel i5 using SSE3. A factor of 3 could be reasonable. Am I using SSE3? Tullio |
Send message Joined: 27 Sep 04 Posts: 20 Credit: 23,880 RAC: 0 |
I've found this discussion on the boinc developers forum. I have a AMD Phenom II 965 with linux, in /proc/cpuinfo I don't have SSE3 or SSSE3 flags, but I can see PNI and SSE4a: # less /proc/cpuinfo | grep flags flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save Using BOINC 6.x I get only partial flags: lun 09 lug 2012 11:33:54 CEST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt But using BOINC 7.0.28 I can see all flags: 09-Jul-2012 11:25:46 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save So the problem of LHC not using SSE3 is not project related, but it's due to the fact that linux systems doesn't show SSE3 flag... I think you should use PNI instead. ...however my host doesn't download SSE2 apps neither.... EDIT: look also at this and this... all confirm that in linux kernel pni=sse3 |
Send message Joined: 25 Aug 05 Posts: 69 Credit: 306,627 RAC: 0 |
I guess both should be specified. Unless it is sure that also older BOINC versions do report PNI instead of SSE3. Otherwise those will be left out. Christoph |
Send message Joined: 27 Sep 04 Posts: 20 Credit: 23,880 RAC: 0 |
BOINC itself doesn't detect anything. It simply reports what the system (the kernel) detects (and what you can see with the /proc/cpuinfo command). BOINC versions prior to 7.x have a bug that prevents all the flags from being read (see my example above). But, yes, PNI is reported also for older versions... SSE3 is never reported for AMD CPUs (only the latest CPUs based on Bulldozer architecture have SSSE3 and SSE4.1, but even here SSE3 is reported as PNI). |
Send message Joined: 25 Aug 05 Posts: 69 Credit: 306,627 RAC: 0 |
It looks like I wasn't clear enough in my post. I had the server side in mind. They have right now SSE3 specified there and some hosts with that get the right app. So, put PNI additional at the server side to the specification for that app. Christoph |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
One suggestion - SSE4 and SSSE3 both imply the processor is capable of SSE3/pni. so maybe it would be easier to check for an "OR" of those flags? pni is definitely reported on old versions with Pentium D and Phenom processors on windows. <p_model>AMD Phenom(tm) FX-5000 Quad-Core Processor [Family 16 Model 4 Stepping 2]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow</p_features> |
Send message Joined: 9 Jan 08 Posts: 66 Credit: 727,923 RAC: 0 |
pni is definitely reported on old versions with Pentium D and Phenom processors on windows. As I wrote in http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3370 then even an intel i5 reports pni |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
7/8/2012 2:44:27 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow under 6.10.60 win7 x64 pni as well |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
From a Win2K8 R2 x64 Server which did get sse3 jobs: pni as well under 7.0.28 although ssse3 is present. Intel Quad Q6600 6/30/2012 5:20:24 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe |
Send message Joined: 14 Jul 05 Posts: 11 Credit: 81,274 RAC: 0 |
Just to drow in some info from a Mac mini Mid 2010: Tue 10 Jul 23:00:11 2012 | | Starting BOINC client version 6.12.43 for x86_64-apple-darwin Tue 10 Jul 23:00:11 2012 | | Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz [x86 Family 6 Model 23 Stepping 10] Tue 10 Jul 23:00:11 2012 | | Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 XSAVE Tue 10 Jul 23:00:11 2012 | | OS: Mac OS X 10.6.8 (Darwin 10.8.0) |
©2024 CERN