Message boards : Number crunching : Request for SSE4 and/or OpenCL applications
Message board moderation

To post messages, you must log in.

AuthorMessage
D337z

Send message
Joined: 1 Mar 12
Posts: 3
Credit: 31,050
RAC: 0
Message 24421 - Posted: 23 Jul 2012, 20:44:39 UTC

Hey, I know that a lot of people don't see the importance of squeezing out every little bit of processor capability, but with Intel, using SSE4.1 can increase throughput of memory transfers by about 4x by combining multiple 16-byte reads together into a single 64-byte read.
There are also a few other additions, but this is the biggest one. It only applies to Intel though.

As for the OpenCL application, if the calculations are somewhat repetitive, OpenCL is the best way to go since GPUs are capable of running these computations faster than any Desktop processor will allow. Not only that, but you can vectorize the computations and see speed-ups significantly greater than what you saw with SSE3.

But yes, I simply request that updates be made to keep up with the capabilities of today's processors in order to maximize the calculation output and further science at an "accelerated" rate.
ID: 24421 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 24422 - Posted: 23 Jul 2012, 21:54:37 UTC - in response to Message 24421.  

yes, executable compiled with SSE4 option could be there when we update the system next time.

For the GPU based computing it will take longer. It is not sufficient to pass it from a compiler - profound changes are necessary at algorithmic level. We have it on the roadmap.

Igor.
skype id: igor-zacharov
ID: 24422 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,699,023
RAC: 234,826
Message 24423 - Posted: 24 Jul 2012, 1:00:36 UTC

I'd give a thumbs up for SSE4.x

There is plenty of people on here with Phenom's and Buldozer CPU's from AMD so they will see the improvements too.
ID: 24423 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,501,728
RAC: 4,157
Message 24424 - Posted: 24 Jul 2012, 8:05:32 UTC


Two thumbs up!!


Volunteer Mad Scientist For Life
ID: 24424 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24425 - Posted: 24 Jul 2012, 8:50:48 UTC

Thumps up for SSE 4.1 here also.

My old HP laptop from 2008 with a Intel core 2 duo T9400 has SSE 4.1, so there should be plenty around with that capability.
ID: 24425 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 24426 - Posted: 24 Jul 2012, 9:56:49 UTC - in response to Message 24425.  
Last modified: 24 Jul 2012, 9:58:06 UTC

My new HP 635 laptop from 2011 has only pni on its E-450 AMD APU CPU, like my 2008 SUN WS with its Opteron 1210 CPU. But they crunch reliably. The only problem they have is related to their SuSE Linux (11.1 on SUN and SLES 11 on the HP) which does not support BOINC versions later than 6.10.58. I've complained with SuSE, with little result. I've downloaded 7.0.28 on the HP but it does not even start. The SUN is running 7 BOINC projects, including Test4Theory@home.
Tullio
ID: 24426 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 24427 - Posted: 24 Jul 2012, 20:21:24 UTC
Last modified: 24 Jul 2012, 20:34:28 UTC

I've installed VirtualBox 4.1.18 and BOINC 6.10.58 on my laptop. I've started Test4Theory@home on it and it works nicely. But since I have little experience with laptops shall I have to shutdown it at night? The SUN WS runs 24/7 without problems.
Tullio
ID: 24427 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24428 - Posted: 24 Jul 2012, 20:42:43 UTC - in response to Message 24427.  

Well, I did run on a Laptop a long time without knowing the temps. Now that one is only frequently on but it's running with TThrottle.

CPU Throttle is set to 80°C degrees, but that is high. When I regularly clean the cooler 75°C would be enough.

Since the GPU is in your case an integrated one you have to find out the right temp yourself.

Oh, my CPU did clock down at 82°C. And night time is better running since usually the ambient temp is going down.
Christoph
ID: 24428 · Report as offensive     Reply Quote
D337z

Send message
Joined: 1 Mar 12
Posts: 3
Credit: 31,050
RAC: 0
Message 24433 - Posted: 25 Jul 2012, 9:21:16 UTC
Last modified: 25 Jul 2012, 9:25:13 UTC

Christoph, I take it that you've never opened your laptop and replaced the thermal pad they use by default in the factories with some better thermal paste to reduce temperatures?

And as far as SSE4.1 goes, I can only really see Intel processors benefiting from this as they include the MOVNTDQA function to read 4 16-byte data figures in a single 64-byte read. There might be some other functions that I'm not used to messing with, but this one will be of the greatest benefit as far as I can see.

In all honesty, I don't know why we have a generic 64-bit application when all 64-bit processors will have at least SSE2 unless there's some processor out there that I'm not aware of with lesser capabilities.

So, we can wipe at least 2 applications off of the list by this logic alone; possibly 4.

And as for you Tullio, have you attempted a custom compile of your Linux kernel to update it? The worst thing that could happen is it won't boot using that kernel and you'll have to revert back to the original. It'll be a full night's work though.
ID: 24433 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 24434 - Posted: 25 Jul 2012, 9:48:27 UTC - in response to Message 24433.  

And as for you Tullio, have you attempted a custom compile of your Linux kernel to update it? The worst thing that could happen is it won't boot using that kernel and you'll have to revert back to the original. It'll be a full night's work though.

I have compiled a lot of Linux kernels, starting with Slackware.But on this laptop I found installed a SuSE Enterprise Desktop 11 sp1 and no source code. So I am just using it to see what it is worth. I have a copy of OpenSUSE 12.1 with its source code and I could install it. But for the moment I am just exploring the capabilities of the AMD APU CPU, which seem good. It uses only 18 W.
Tullio
ID: 24434 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24435 - Posted: 25 Jul 2012, 10:36:09 UTC - in response to Message 24433.  
Last modified: 25 Jul 2012, 10:38:22 UTC

In all honesty, I don't know why we have a generic 64-bit application when all 64-bit processors will have at least SSE2 unless there's some processor out there that I'm not aware of with lesser capabilities.

So, we can wipe at least 2 applications off of the list by this logic alone; possibly 4.


Yep, there should be no 64 bit processor out there without SSE2 support. The 64 bit processors all came out after SSE2 exstentions where released.

Windows 8 actually uses SSE2 in it's code, so you can't install it without having a SSE2 compatible CPU.

So yes, you should remove the generic version for Windows and Linux 64 bit, since they just cost more processing time = less science being done.
ID: 24435 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24436 - Posted: 25 Jul 2012, 13:15:43 UTC - in response to Message 24433.  

I actually did that a couple of weeks ago. How good the paste it I don't now.
But after that cleaning and pad for paste change it has 72°C max under BOINC load. I don't run anything on the GPU anymore.
Christoph
ID: 24436 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 24437 - Posted: 25 Jul 2012, 14:47:49 UTC
Last modified: 25 Jul 2012, 14:48:56 UTC

I tried a simpler solution: since the laptop has air intakes on its bottom. I raised it slightly with a couple of plastic rods which used to contain 16 kbit memory chips for my old Olivetti-AT&T UNIX PC (a.k.a. PC7300 or Safari). It seems to work.
Tullio
ID: 24437 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,618,118
RAC: 3,938
Message 24438 - Posted: 25 Jul 2012, 14:51:50 UTC - in response to Message 24437.  

I tried a simpler solution: since the laptop has air intakes on its bottom. I raised it slightly with a couple of plastic rods which used to contain 16 kbit memory chips for my old Olivetti-AT&T UNIX PC (a.k.a. PC7300 or Safari). It seems to work.
Tullio

I have done the same. I raised the laptop up on a couple of books, while keeping the air vents clear. Works great, no overheating problems now!
Regards,
Bob P.
ID: 24438 · Report as offensive     Reply Quote
Christoph

Send message
Joined: 25 Aug 05
Posts: 69
Credit: 306,627
RAC: 0
Message 24441 - Posted: 25 Jul 2012, 17:58:54 UTC - in response to Message 24438.  

Oh yea, that part I forgot to mention. I have a small piece of timber under it.
Christoph
ID: 24441 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24458 - Posted: 29 Jul 2012, 17:23:05 UTC - in response to Message 24435.  

Actually all our executables are actually 32-bit but are renamed
and run on 64-bit systems. I am planning 64-bit native executables
but will need a new crlibm (elementary function library which exists)
and a lot of testing of numerical compatibility, but which SHOULD be
OK. Can't jump to conclusions though as some of the portability/
compatibility issues are rather subtle. Still they shouldn't be affected
by the size of the address field! However I also found the iINTEL fort compiler
converts data differently between Windows and Linux for example
even on the same hardware! Eric.
ID: 24458 · Report as offensive     Reply Quote
D337z

Send message
Joined: 1 Mar 12
Posts: 3
Credit: 31,050
RAC: 0
Message 24463 - Posted: 2 Aug 2012, 6:13:50 UTC - in response to Message 24458.  

Well, I don't know how much help it'll be to you, but SSE4.1 includes a few float rounding options natively. And, since you're utilizing C, porting to OpenCL wouldn't be extremely difficult.
I take it that you're using GCC to (cross)compile with?
And yeah, fortran is a pain like that. :/ It's probably because the compilers use different sets of optimizations for each. Keeping the optimizations similar would result in similar output (in theory), but a slower kernel being produced.

Out of curiosity, though, why are you telling it to round the floats? Are you needing to run some calculations that can't be run on float types by default?

Sorry if I seem a little bit on both the informed and uninformed side of programming at the same time, I usually just play with assembly after the fact to speed things up. If you want to know just how reckless I am with compiling, I use -O3 while compiling my Linux kernel. So, I might seem a little out there. ;-)

BTW, you can really start to open up your code to speed optimizations once you make use of the xmm (and ymm later) registers. At that point, you'll be able to run computations on multiple floats at once and then round them with a native processor command. (see smmintrin.h to replace your crlibm library for those processors which support it)
ID: 24463 · Report as offensive     Reply Quote
cornel

Send message
Joined: 17 Oct 09
Posts: 1
Credit: 41,768
RAC: 0
Message 24594 - Posted: 13 Aug 2012, 17:50:44 UTC

Please, please send only pni(sse3) workunits to qualifying AMD processors. It is sad to see WUs going up to 35 hours on sse2 when they could have been done in less time with the other executable.
ID: 24594 · Report as offensive     Reply Quote

Message boards : Number crunching : Request for SSE4 and/or OpenCL applications


©2024 CERN