Message boards : Number crunching : How is the SSE3 thing coming along?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,105,291
RAC: 0
Message 24132 - Posted: 8 Jul 2012, 1:33:58 UTC
Last modified: 8 Jul 2012, 1:34:55 UTC

Speaking of tricky things... I "sometimes" get SSE3 tasks on the i7-950 box but not on my AMD 1100T box. Both run Win-7 64b with 12Gig memory.

Interestingly, this is what BOINC 7.0.28 reports on both boxes:

i7-950: Win7-950
6 7/4/2012 1:39:55 PM Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz [Family 6 Model 26 Stepping 5]
7 7/4/2012 1:39:55 PM Processor: 256.00 KB cache
8 7/4/2012 1:39:55 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall lm vmx tm2 popcnt pbe


AMD 1100T: Win7-Compaq
7 7/7/2012 1:05:22 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1100T Processor [Family 16 Model 10 Stepping 0]
8 7/7/2012 1:05:22 PM Processor: 512.00 KB cache
9 7/7/2012 1:05:22 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow

Interesting that neither processor reports SSE3, however, they are both supposed to support it according to:


http://en.wikipedia.org/wiki/SSE3

Does seem to me the SSE3 WU's run twice as fast... but who knows for sure why..

:)
ID: 24132 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 24133 - Posted: 8 Jul 2012, 1:39:45 UTC

Interesting and useful feedback.
sse2/3 run faster because SixTrack was originaly vectorised
for the Cray or pipelined and sse2/3 are designed to
be particularly effective on this kind of code on both
Intel and AMD. Eric.
ID: 24133 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,105,291
RAC: 0
Message 24134 - Posted: 8 Jul 2012, 1:48:59 UTC - in response to Message 24133.  
Last modified: 8 Jul 2012, 1:52:57 UTC

Interesting and useful feedback.
sse2/3 run faster because SixTrack was originaly vectorised
for the Cray or pipelined and sse2/3 are designed to
be particularly effective on this kind of code on both
Intel and AMD. Eric.


Soo, I am guessing there may be a BOINC bug (server and/or client) when deciding to send the proper WU's to a system... or else, it's maybe balancing the tasks so the non-sse3 WU's also get done?

In any case, I can't wait until things get tweaked a little better... better for everything!

Thanks!

:)

PS: Here is another box actively running LHC and it doesn't receive SSE3 tasks either..


Win7-AZZA

7 7/7/2012 7:56:06 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
8 7/7/2012 7:56:06 PM Processor: 512.00 KB cache
9 7/7/2012 7:56:06 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
ID: 24134 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 24135 - Posted: 8 Jul 2012, 3:35:55 UTC

Even more interesting, I note that the "flags"
do not include sse3. Still you should get sse2
at least, whch is good enough............Eric.
ID: 24135 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 24136 - Posted: 8 Jul 2012, 3:38:00 UTC

.........and I see ssse3 in another set of flags!
(I have seen this om my AMD but I thought is was cygwin.
ID: 24136 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,105,291
RAC: 0
Message 24138 - Posted: 8 Jul 2012, 3:51:09 UTC - in response to Message 24136.  

.........and I see ssse3 in another set of flags!
(I have seen this om my AMD but I thought is was cygwin.



LOL! The case of the missing BOINC Client flags! LOL!

Oh well, I'm sure it will all get sorted out one of these days.

:D



Win7-R400

7 7/7/2012 1:31:38 PM Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1100T Processor [Family 16 Model 10 Stepping 0]
8 7/7/2012 1:31:38 PM Processor: 512.00 KB cache
9 7/7/2012 1:31:38 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
ID: 24138 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 185
Credit: 3,297,428
RAC: 0
Message 24144 - Posted: 8 Jul 2012, 7:56:56 UTC - in response to Message 24135.  
Last modified: 8 Jul 2012, 7:57:22 UTC

Even more interesting, I note that the "flags"
do not include sse3. Still you should get sse2
at least, whch is good enough............Eric.

The flag list does contain 'pni', which is another name for SSE3.
ID: 24144 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 606
Credit: 3,770,132
RAC: 1,261
Message 24153 - Posted: 8 Jul 2012, 15:05:45 UTC

444.01 is certainly faster on my Linux box, Opteron 1210 with pni in its flags.
Tullio
ID: 24153 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,623,712
RAC: 7
Message 24154 - Posted: 8 Jul 2012, 15:20:00 UTC

The assignation is quite weird, for a given machine that supports both SSE2 and SSE3, I have a mix of SSE3 and SSE2 WUs ... I guess it's not the best assignments for a fast return of work, this machine should only get SSE3 work, isn't it ?
ID: 24154 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,105,291
RAC: 0
Message 24157 - Posted: 8 Jul 2012, 15:56:47 UTC - in response to Message 24144.  

Even more interesting, I note that the "flags"
do not include sse3. Still you should get sse2
at least, whch is good enough............Eric.

The flag list does contain 'pni', which is another name for SSE3.


I completely missed that "pni" flag thing...

Soo, none of the AMD boxes here ever get SSE3 tasks, only the Intel 950 box...

Hmmmmmm...

Me thinks more tweaking to do in Boinc Land...

:)
ID: 24157 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 606
Credit: 3,770,132
RAC: 1,261
Message 24167 - Posted: 9 Jul 2012, 7:22:21 UTC
Last modified: 9 Jul 2012, 7:23:55 UTC

444.01 is about 4 times faster than 443.07 on my Opteron 1210 running SuSE Linux 11.1. but it's slower by a factor of ten compared to an Intel i5 using SSE3. A factor of 3 could be reasonable. Am I using SSE3?
Tullio
ID: 24167 · Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 24169 - Posted: 9 Jul 2012, 9:35:26 UTC
Last modified: 9 Jul 2012, 9:47:11 UTC

I've found this discussion on the boinc developers forum.

I have a AMD Phenom II 965 with linux, in /proc/cpuinfo I don't have SSE3 or SSSE3 flags, but I can see PNI and SSE4a:
# less /proc/cpuinfo | grep flags
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save


Using BOINC 6.x I get only partial flags:
lun 09 lug 2012 11:33:54 CEST |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt



But using BOINC 7.0.28 I can see all flags:
09-Jul-2012 11:25:46 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save


So the problem of LHC not using SSE3 is not project related, but it's due to the fact that linux systems doesn't show SSE3 flag... I think you should use PNI instead.

...however my host doesn't download SSE2 apps neither....

EDIT: look also at this and this... all confirm that in linux kernel pni=sse3
ID: 24169 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24181 - Posted: 9 Jul 2012, 14:30:59 UTC

I guess both should be specified.

Unless it is sure that also older BOINC versions do report PNI instead of SSE3.

Otherwise those will be left out.
Christoph
ID: 24181 · Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 24183 - Posted: 9 Jul 2012, 15:13:42 UTC - in response to Message 24181.  

BOINC itself doesn't detect anything. It simply reports what the system (the kernel) detects (and what you can see with the /proc/cpuinfo command).

BOINC versions prior to 7.x have a bug that prevents all the flags from being read (see my example above). But, yes, PNI is reported also for older versions... SSE3 is never reported for AMD CPUs (only the latest CPUs based on Bulldozer architecture have SSSE3 and SSE4.1, but even here SSE3 is reported as PNI).
ID: 24183 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24192 - Posted: 9 Jul 2012, 17:06:42 UTC - in response to Message 24183.  
Last modified: 9 Jul 2012, 17:06:56 UTC

It looks like I wasn't clear enough in my post.

I had the server side in mind.

They have right now SSE3 specified there and some hosts with that get the right app. So, put PNI additional at the server side to the specification for that app.
Christoph
ID: 24192 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 3,107,516
RAC: 562
Message 24205 - Posted: 9 Jul 2012, 19:27:53 UTC
Last modified: 9 Jul 2012, 19:34:56 UTC

One suggestion - SSE4 and SSSE3 both imply the processor is capable of SSE3/pni.
so maybe it would be easier to check for an "OR" of those flags?

pni is definitely reported on old versions with Pentium D and Phenom processors on windows.

<p_model>AMD Phenom(tm) FX-5000 Quad-Core Processor [Family 16 Model 4 Stepping 2]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow</p_features>
ID: 24205 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24207 - Posted: 9 Jul 2012, 19:37:01 UTC - in response to Message 24205.  
Last modified: 9 Jul 2012, 19:40:03 UTC

pni is definitely reported on old versions with Pentium D and Phenom processors on windows.


As I wrote in http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3370 then even an intel i5 reports pni
ID: 24207 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 3,107,516
RAC: 562
Message 24211 - Posted: 9 Jul 2012, 22:42:04 UTC

7/8/2012 2:44:27 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow


under 6.10.60 win7 x64 pni as well
ID: 24211 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 3,107,516
RAC: 562
Message 24237 - Posted: 10 Jul 2012, 18:52:17 UTC
Last modified: 10 Jul 2012, 19:29:04 UTC

From a Win2K8 R2 x64 Server which did get sse3 jobs: pni as well under 7.0.28
although ssse3 is present. Intel Quad Q6600

6/30/2012 5:20:24 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe
ID: 24237 · Report as offensive     Reply Quote
MB Atlanos

Send message
Joined: 14 Jul 05
Posts: 11
Credit: 72,634
RAC: 0
Message 24240 - Posted: 10 Jul 2012, 21:51:51 UTC
Last modified: 10 Jul 2012, 21:53:42 UTC

Just to drow in some info from a Mac mini Mid 2010:

Tue 10 Jul 23:00:11 2012 | | Starting BOINC client version 6.12.43 for x86_64-apple-darwin
Tue 10 Jul 23:00:11 2012 | | Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz [x86 Family 6 Model 23 Stepping 10]
Tue 10 Jul 23:00:11 2012 | | Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 XSAVE
Tue 10 Jul 23:00:11 2012 | | OS: Mac OS X 10.6.8 (Darwin 10.8.0)
ID: 24240 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : How is the SSE3 thing coming along?


©2020 CERN