Message boards : Number crunching : Linux vs. Windows app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23987 - Posted: 18 Jun 2012, 18:21:30 UTC

I have finally found a workable solution for the AMD Athlon performance issue. The AMD now executes two or three times faster on both Linux and Windows. The price is that there are now two executables for Linux and two for Windows. I will produce a performance report when time permits but I consider the problem solved at least for the interim.

The four executables can be found in
boinc02:/data/boinc/executables
as usual.

boinc02] /data/boinc/executables $ ls -ltr *440* -rwxr-xr-x 1 mcintosh c3 4553205 Jun 18 16:12 SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2
-rwxr-xr-x 1 mcintosh c3 4627438 Jun 18 16:12 SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2
-rwxr-xr-x 1 mcintosh c3 2568192 Jun 18 17:05 SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe
-rwxr-xr-x 1 mcintosh c3 3031040 Jun 18 17:05 SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe
[boinc02] /data/boinc/executables $ file *440*
SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, statically linked, not stripped
SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit
SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, statically linked, not stripped
SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit

The executables:
SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2
SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe

are for Linux and Windows and require SSE2 (basically pentium 4 or later).

The executables
SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2
SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe
should run on any old PC but are slower.

I hope Igor can install them soonest as they use the version number 4437 in fort.10 for compatibility with the existing BOINC executables.

The other good news is that Laurent is progressing rapidly now with the generation of a MacOS executable and unsurprisingly there are no numerical differences.
(Performance is excellent as we are supporting only Intel based Macs.)

I hope this will finally let us get into full production mode.
I am still trying to verify the physics results for version 4440 but no surprises so far. I need to finish this urgently and then fix the run_post problem in SixDesk, finish the SixDesk documentation, and get started with Bernard on real production.

Eric.
Yours in haste, Euro 2012 is kicking of shortly :-)
ID: 23987 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24021 - Posted: 1 Jul 2012, 6:07:14 UTC

Here are some timings for SixTrack. (I am also looking at beam-beam
where ibtype=1 looks to be 20% faster than ibtype=0. Still
checking the physics though.) Eric.

As promised here are some timing results for Sixtrack latest totally numerically
portable versions v4437 and v4430. v4437 is supposed to use SSE2 if
available but clearly does not use it on an AMD system.
Intel does document this even if it is rather nasty and a purely commercal
decision with no technical merit.
There are two v4440 sub-versions, one uses SSE2 anyway, and the other not.
Hope these v4440 versions can be in production next week.
If I can believe the 800MHz number the Opteron performance is rather good.

The machines tested are
Windows: w1 2.3GHz AMD Phenom abcp11974 at CERN
w2 3.1GHz Intel Core PCBE13896 at CERN
Linux: l1 0.8GHz AMD Opteron 6164 HE lxbsu2104 CERN lxbatch
l2 2.3GHz Intel Xeon L5640 lxbsq2401 CERN lxbatch

==================
Case lost
==================
w2: 4437 : For 53018 Turn(s) 6.88
w1: 4437 : For 53018 Turn(s) 15.9
w2: 4440 sse2: For 53018 Turn(s) 6.75
w1: 4440 sse2: For 53018 Turn(s) 11.8
l1: 4437 : For 53018 Turn(s) 56.2
l1: 4440 sse2: For 53018 Turn(s) 12.1
l2: 4440 NO sse2: For 53018 Turn(s) 35.4
l2: 4440 sse2: For 53018 Turn(s) 7.31
==================
Case s316
==================
w2: 4437 : For 123449 Turn(s) 19.3
w1: 4437 : For 123449 Turn(s) 42.4
w2: 4440 sse2: For 123449 Turn(s) 19.4
w1: 4440 sse2: For 123449 Turn(s) 32.6
l1: 4437 : For 123449 Turn(s) 176.
l1: 4440 sse2: For 123449 Turn(s) 41.6
l2: 4440 NO sse2: For 123449 Turn(s) 83.6
l2: 4440 sse2: For 123449 Turn(s) 27.0
==================
Case frs60
==================
w2: 4437 : For 6000000 Turn(s) 1120
w1: 4437 : For 6000000 Turn(s) 2410
w2: 4440 sse2: For 6000000 Turn(s) 1070
w1: 4440 sse2: For 6000000 Turn(s) 1800
l1: 4437 : For 6000000 Turn(s) 9270
l1: 4440 sse2: For 6000000 Turn(s) 2420
l2: 4440 NO sse2: For 6000000 Turn(s) 4380
l2: 4440 sse2: For 6000000 Turn(s) 1730

ID: 24021 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24029 - Posted: 2 Jul 2012, 14:09:52 UTC - in response to Message 24021.  
Last modified: 2 Jul 2012, 14:54:56 UTC

sounds promising, at least it'll allow me to return results within a reasonable time frame for my wingman.

my current Win32 x86 AMD is able to complete after ~20hrs http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1750589 of computing on latest results. the faster box running Linux 64 is seldom able to complete in 48 hrs.
ID: 24029 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 24034 - Posted: 3 Jul 2012, 19:44:15 UTC

All,

we have prepared some optimized sixtrack executables. Hopefully, the machines
will now be selected correctly based on the available feature set (sse3).
Will do more of these in the next days (sse2 is coming).

There is also a MAC executable (10.5 and up, let us know if you need others).

If you notice problems, please send us a note. I apologize in advance
if we have missed any machines or some platforms do not run. This should
be rectified if you send us the warnings.

Thank you.
skype id: igor-zacharov
ID: 24034 · Report as offensive     Reply Quote
Rayburner

Send message
Joined: 13 Jul 05
Posts: 9
Credit: 7,443,719
RAC: 0
Message 24035 - Posted: 3 Jul 2012, 19:57:37 UTC

Igor,

this host http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9944267 has received the new executable as generic and as sse3.

Regards,
Rayburner
ID: 24035 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24076 - Posted: 5 Jul 2012, 3:09:31 UTC - in response to Message 24035.  

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1798846

got a linux_gen unit 444.01, worked fine took about 7x longer than my wingman
but he was Intel i7 sse3. I expected to be 2-3x slower on my older AMD on a similar unit so this sounds pretty reasonable.
ID: 24076 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 108
Credit: 663,175
RAC: 0
Message 24083 - Posted: 5 Jul 2012, 13:06:12 UTC
Last modified: 5 Jul 2012, 13:31:50 UTC

Run time the same on my Windows 32 bit machines with the new 444.01 application against the old 443.07 application.

My Linux 64 bit machine was taking up to 62,868 seconds (17.46 hours) to complete a 443.07 WU.

Have not had a successful 444.01 WU yet as I got all the download failures yesterday.
But I do have one running that after 12.14 Hours is at 77.107% with an estimate of 15.74 Hours to completion.

Again not a lot of difference from the previous application, but a minor speed increase, nothing like the results that Eric was getting in testing.

Computers are all AMD Phenom II 4 and 6 core.

Conan

EDIT:-- Well I think I know why the run times are nearly the same.
On checking out what BOINC picks up for each CPU it does not detect SSE3 on any of my AMD Phenom II CPUs, which from what I have read all have this feature. Windows or Linux makes no difference.

So I am running the non SSE3 version of the application due to BOINC not detecting all the attributes of my CPUs, which makes the project think I do not have this capability when in fact I do.
ID: 24083 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 691,195,816
RAC: 106,271
Message 24087 - Posted: 6 Jul 2012, 0:01:04 UTC

I think the lack of SSE3 detetion is a bug in 6.10.x some one else reported that on windows with Intel CPU.

Try getting 6.12.x or even rock on to the 7.0.28!
ID: 24087 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 108
Credit: 663,175
RAC: 0
Message 24088 - Posted: 6 Jul 2012, 1:45:32 UTC - in response to Message 24087.  

I think the lack of SSE3 detetion is a bug in 6.10.x some one else reported that on windows with Intel CPU.

Try getting 6.12.x or even rock on to the 7.0.28!


I tried a while ago to update to 6.12.34 on my Linux computers but I could not get it to work due to the missing version data on one of the files, and no matter what I tried and searched for I could not get it working so gave up.

Same thing with 7.0.25 but worse as at least 3 files were missing version data that it claimed it needed.

I sent this information through to BOINC Alpha Testing but have heard nothing back.
Perhaps 7.0.28 is the response to my earlier problem? Perhaps not.

I believe that compiling my own version of BOINC on that particular computer will get everything working, but not being a programmer I am reluctant to try this.

I also read another user saying that once I go to 7.0.x it is hard to go back to an earlier version.

I am already running 6.12.34 on my Windows computers and it has the same problem with BOINC not detecting my CPU attributes.

So I will either continue running a lot slower than I am capable of doing or I will go elsewhere, I will think about it.

Thanks anyway for your help and suggestions.

Conan
ID: 24088 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24092 - Posted: 6 Jul 2012, 7:43:15 UTC

Thanks for all that Conan. Not using sse2/3 on AMD is an
Intel ifort "feature" :-(. I have made new executables
which will use sse2/3 or crash! along with a run
without sse2/3 for older systems. Igor is traveling but
I hope to post news on this early next week.
I shall post latest performance numbers shortly. Eric.
ID: 24092 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24095 - Posted: 6 Jul 2012, 12:25:21 UTC - in response to Message 24087.  
Last modified: 6 Jul 2012, 12:32:51 UTC

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9963851

haven't gotten SSE3 units either running i5, Win7 x64 and 7.0.28 so that's not true.

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1798048
this WU appears that I'm not using the sse3 image but kept pace with my wingman i7 (hyperthreaded I assume)
ID: 24095 · Report as offensive     Reply Quote
Desti

Send message
Joined: 16 Jul 05
Posts: 84
Credit: 1,875,851
RAC: 0
Message 24104 - Posted: 6 Jul 2012, 21:30:19 UTC
Last modified: 6 Jul 2012, 21:34:10 UTC

I also didn't get the sse3 version yet, even BOINC is detecting all cpu flags correct. Have you configured the server right, the flag is called PNI and not SSE3 on linux.

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1813515
Linux Users Everywhere @ BOINC
[url=http://lhcathome.cern.ch/team_display.php?teamid=717]
ID: 24104 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24106 - Posted: 6 Jul 2012, 22:21:53 UTC
Last modified: 6 Jul 2012, 23:21:03 UTC

looks like only gen jobs are being sent now to my Intel Win7 machine
<p_model> Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe</p_features>

it's still getting jobs unlike my AMD boxes

noticed no jobs (gen or sse3 now) despite many requests on my AMD XP/Win7 machines and a bunch in queue, SSE2 is avail on both - the XP box has a Phenom X4
<p_model>AMD Phenom(tm) FX-5000 Quad-Core Processor [Family 16 Model 4 Stepping 2]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow</p_features>

<p_model>Mobile AMD Athlon(tm) 64 Processor 3000+ [Family 15 Model 4 Stepping 10]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm 3dnowext 3dnow</p_features>
ID: 24106 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24109 - Posted: 7 Jul 2012, 0:53:16 UTC
Last modified: 7 Jul 2012, 1:01:37 UTC

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1759458

another instance of Win32 AMD running considerably faster than Linux wingman on a faster AMD. the Win32 AMD is a quad ~2.3Ghz and the faster x6 1090T should be ~3.4Ghz yet took 40% longer on 443.07

and situation is completely reversed with wingman running identical processor on
Win7 x64 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1762881
ID: 24109 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24137 - Posted: 8 Jul 2012, 3:45:19 UTC

Now I have spotted "the flag is called PNI and not SSE3 on linux"
Could this be the problem? Eric.
The joys of portability, Windows, MacOS, and Unix flavours!

ID: 24137 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24160 - Posted: 8 Jul 2012, 16:45:45 UTC - in response to Message 24137.  
Last modified: 8 Jul 2012, 16:48:21 UTC

<p_model> Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe</p_features>

Win 7 x64 also reports it as PNI under 8.0.28

however a wingman running Win 7 Home x64
is working under 6.10, so it seems arbitrary?

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9944757
ID: 24160 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24175 - Posted: 9 Jul 2012, 11:21:15 UTC
Last modified: 9 Jul 2012, 11:22:17 UTC

My Intel 64 bit Windows 7 computer also reports it as pni. So it seems either that 7.0.28 changed the reporting, or else it has been like this the whole time.

I copied the relevant lines from the startop log:

Starting BOINC client version 7.0.28 for windows_x86_64
Processor: 4 GenuineIntel Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz [Family 6 Model 42 Stepping 7]
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe
OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
ID: 24175 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24185 - Posted: 9 Jul 2012, 15:32:49 UTC - in response to Message 24175.  

My Intel 64 bit Windows 7 computer also reports it as pni. So it seems either that 7.0.28 changed the reporting, or else it has been like this the whole time.


Just got home and checked my other pc with 7.0.25. It still reports it as pni.
ID: 24185 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24203 - Posted: 9 Jul 2012, 19:17:41 UTC - in response to Message 24185.  
Last modified: 9 Jul 2012, 19:19:03 UTC

reported as pni going back to 2010 on a Pentium D and AMD Phenom machine both running Windows x86 versions

<p_model>Intel(R) Pentium(R) D CPU 2.80GHz [x86 Family 15 Model 6 Stepping 4]</p_model>
<p_features>fpu tsc pae nx sse sse2 pni mmx</p_features>
ID: 24203 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 24225 - Posted: 10 Jul 2012, 16:10:04 UTC

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9971122

Interesting, linux 6.10 host running sse3 so there seems to be no correlation between boinc host and whether or not the sse3 image gets downloaded?
ID: 24225 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Linux vs. Windows app


©2024 CERN