Message boards :
Number crunching :
Linux vs. Windows app
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I have finally found a workable solution for the AMD Athlon performance issue. The AMD now executes two or three times faster on both Linux and Windows. The price is that there are now two executables for Linux and two for Windows. I will produce a performance report when time permits but I consider the problem solved at least for the interim. The four executables can be found in boinc02:/data/boinc/executables as usual. boinc02] /data/boinc/executables $ ls -ltr *440* -rwxr-xr-x 1 mcintosh c3 4553205 Jun 18 16:12 SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2 -rwxr-xr-x 1 mcintosh c3 4627438 Jun 18 16:12 SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2 -rwxr-xr-x 1 mcintosh c3 2568192 Jun 18 17:05 SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe -rwxr-xr-x 1 mcintosh c3 3031040 Jun 18 17:05 SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe [boinc02] /data/boinc/executables $ file *440* SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, statically linked, not stripped SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, statically linked, not stripped SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit The executables: SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2 SixTrack_4440_crlibm_bnl_ifort_boinc_api_SSE2_O2.exe are for Linux and Windows and require SSE2 (basically pentium 4 or later). The executables SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2 SixTrack_4440_crlibm_bnl_ifort_boinc_api_O2.exe should run on any old PC but are slower. I hope Igor can install them soonest as they use the version number 4437 in fort.10 for compatibility with the existing BOINC executables. The other good news is that Laurent is progressing rapidly now with the generation of a MacOS executable and unsurprisingly there are no numerical differences. (Performance is excellent as we are supporting only Intel based Macs.) I hope this will finally let us get into full production mode. I am still trying to verify the physics results for version 4440 but no surprises so far. I need to finish this urgently and then fix the run_post problem in SixDesk, finish the SixDesk documentation, and get started with Bernard on real production. Eric. Yours in haste, Euro 2012 is kicking of shortly :-) |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Here are some timings for SixTrack. (I am also looking at beam-beam where ibtype=1 looks to be 20% faster than ibtype=0. Still checking the physics though.) Eric. As promised here are some timing results for Sixtrack latest totally numerically portable versions v4437 and v4430. v4437 is supposed to use SSE2 if available but clearly does not use it on an AMD system. Intel does document this even if it is rather nasty and a purely commercal decision with no technical merit. There are two v4440 sub-versions, one uses SSE2 anyway, and the other not. Hope these v4440 versions can be in production next week. If I can believe the 800MHz number the Opteron performance is rather good. The machines tested are Windows: w1 2.3GHz AMD Phenom abcp11974 at CERN w2 3.1GHz Intel Core PCBE13896 at CERN Linux: l1 0.8GHz AMD Opteron 6164 HE lxbsu2104 CERN lxbatch l2 2.3GHz Intel Xeon L5640 lxbsq2401 CERN lxbatch ================== Case lost ================== w2: 4437 : For 53018 Turn(s) 6.88 w1: 4437 : For 53018 Turn(s) 15.9 w2: 4440 sse2: For 53018 Turn(s) 6.75 w1: 4440 sse2: For 53018 Turn(s) 11.8 l1: 4437 : For 53018 Turn(s) 56.2 l1: 4440 sse2: For 53018 Turn(s) 12.1 l2: 4440 NO sse2: For 53018 Turn(s) 35.4 l2: 4440 sse2: For 53018 Turn(s) 7.31 ================== Case s316 ================== w2: 4437 : For 123449 Turn(s) 19.3 w1: 4437 : For 123449 Turn(s) 42.4 w2: 4440 sse2: For 123449 Turn(s) 19.4 w1: 4440 sse2: For 123449 Turn(s) 32.6 l1: 4437 : For 123449 Turn(s) 176. l1: 4440 sse2: For 123449 Turn(s) 41.6 l2: 4440 NO sse2: For 123449 Turn(s) 83.6 l2: 4440 sse2: For 123449 Turn(s) 27.0 ================== Case frs60 ================== w2: 4437 : For 6000000 Turn(s) 1120 w1: 4437 : For 6000000 Turn(s) 2410 w2: 4440 sse2: For 6000000 Turn(s) 1070 w1: 4440 sse2: For 6000000 Turn(s) 1800 l1: 4437 : For 6000000 Turn(s) 9270 l1: 4440 sse2: For 6000000 Turn(s) 2420 l2: 4440 NO sse2: For 6000000 Turn(s) 4380 l2: 4440 sse2: For 6000000 Turn(s) 1730 |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
sounds promising, at least it'll allow me to return results within a reasonable time frame for my wingman. my current Win32 x86 AMD is able to complete after ~20hrs http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1750589 of computing on latest results. the faster box running Linux 64 is seldom able to complete in 48 hrs. |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
All, we have prepared some optimized sixtrack executables. Hopefully, the machines will now be selected correctly based on the available feature set (sse3). Will do more of these in the next days (sse2 is coming). There is also a MAC executable (10.5 and up, let us know if you need others). If you notice problems, please send us a note. I apologize in advance if we have missed any machines or some platforms do not run. This should be rectified if you send us the warnings. Thank you. skype id: igor-zacharov |
Send message Joined: 13 Jul 05 Posts: 9 Credit: 7,443,719 RAC: 0 |
Igor, this host http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9944267 has received the new executable as generic and as sse3. Regards, Rayburner |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1798846 got a linux_gen unit 444.01, worked fine took about 7x longer than my wingman but he was Intel i7 sse3. I expected to be 2-3x slower on my older AMD on a similar unit so this sounds pretty reasonable. |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
Run time the same on my Windows 32 bit machines with the new 444.01 application against the old 443.07 application. My Linux 64 bit machine was taking up to 62,868 seconds (17.46 hours) to complete a 443.07 WU. Have not had a successful 444.01 WU yet as I got all the download failures yesterday. But I do have one running that after 12.14 Hours is at 77.107% with an estimate of 15.74 Hours to completion. Again not a lot of difference from the previous application, but a minor speed increase, nothing like the results that Eric was getting in testing. Computers are all AMD Phenom II 4 and 6 core. Conan EDIT:-- Well I think I know why the run times are nearly the same. On checking out what BOINC picks up for each CPU it does not detect SSE3 on any of my AMD Phenom II CPUs, which from what I have read all have this feature. Windows or Linux makes no difference. So I am running the non SSE3 version of the application due to BOINC not detecting all the attributes of my CPUs, which makes the project think I do not have this capability when in fact I do. |
Send message Joined: 27 Sep 08 Posts: 847 Credit: 691,750,093 RAC: 115,162 |
I think the lack of SSE3 detetion is a bug in 6.10.x some one else reported that on windows with Intel CPU. Try getting 6.12.x or even rock on to the 7.0.28! |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
I think the lack of SSE3 detetion is a bug in 6.10.x some one else reported that on windows with Intel CPU. I tried a while ago to update to 6.12.34 on my Linux computers but I could not get it to work due to the missing version data on one of the files, and no matter what I tried and searched for I could not get it working so gave up. Same thing with 7.0.25 but worse as at least 3 files were missing version data that it claimed it needed. I sent this information through to BOINC Alpha Testing but have heard nothing back. Perhaps 7.0.28 is the response to my earlier problem? Perhaps not. I believe that compiling my own version of BOINC on that particular computer will get everything working, but not being a programmer I am reluctant to try this. I also read another user saying that once I go to 7.0.x it is hard to go back to an earlier version. I am already running 6.12.34 on my Windows computers and it has the same problem with BOINC not detecting my CPU attributes. So I will either continue running a lot slower than I am capable of doing or I will go elsewhere, I will think about it. Thanks anyway for your help and suggestions. Conan |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks for all that Conan. Not using sse2/3 on AMD is an Intel ifort "feature" :-(. I have made new executables which will use sse2/3 or crash! along with a run without sse2/3 for older systems. Igor is traveling but I hope to post news on this early next week. I shall post latest performance numbers shortly. Eric. |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9963851 haven't gotten SSE3 units either running i5, Win7 x64 and 7.0.28 so that's not true. http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1798048 this WU appears that I'm not using the sse3 image but kept pace with my wingman i7 (hyperthreaded I assume) |
Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0 |
I also didn't get the sse3 version yet, even BOINC is detecting all cpu flags correct. Have you configured the server right, the flag is called PNI and not SSE3 on linux. http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1813515 Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
looks like only gen jobs are being sent now to my Intel Win7 machine <p_model> Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe</p_features> it's still getting jobs unlike my AMD boxes noticed no jobs (gen or sse3 now) despite many requests on my AMD XP/Win7 machines and a bunch in queue, SSE2 is avail on both - the XP box has a Phenom X4 <p_model>AMD Phenom(tm) FX-5000 Quad-Core Processor [Family 16 Model 4 Stepping 2]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow</p_features> <p_model>Mobile AMD Athlon(tm) 64 Processor 3000+ [Family 15 Model 4 Stepping 10]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm 3dnowext 3dnow</p_features> |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1759458 another instance of Win32 AMD running considerably faster than Linux wingman on a faster AMD. the Win32 AMD is a quad ~2.3Ghz and the faster x6 1090T should be ~3.4Ghz yet took 40% longer on 443.07 and situation is completely reversed with wingman running identical processor on Win7 x64 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1762881 |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Now I have spotted "the flag is called PNI and not SSE3 on linux" Could this be the problem? Eric. The joys of portability, Windows, MacOS, and Unix flavours! |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
<p_model> Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe</p_features> Win 7 x64 also reports it as PNI under 8.0.28 however a wingman running Win 7 Home x64 is working under 6.10, so it seems arbitrary? http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9944757 |
Send message Joined: 9 Jan 08 Posts: 66 Credit: 727,923 RAC: 0 |
My Intel 64 bit Windows 7 computer also reports it as pni. So it seems either that 7.0.28 changed the reporting, or else it has been like this the whole time. I copied the relevant lines from the startop log: Starting BOINC client version 7.0.28 for windows_x86_64 Processor: 4 GenuineIntel Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz [Family 6 Model 42 Stepping 7] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) |
Send message Joined: 9 Jan 08 Posts: 66 Credit: 727,923 RAC: 0 |
My Intel 64 bit Windows 7 computer also reports it as pni. So it seems either that 7.0.28 changed the reporting, or else it has been like this the whole time. Just got home and checked my other pc with 7.0.25. It still reports it as pni. |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
reported as pni going back to 2010 on a Pentium D and AMD Phenom machine both running Windows x86 versions <p_model>Intel(R) Pentium(R) D CPU 2.80GHz [x86 Family 15 Model 6 Stepping 4]</p_model> <p_features>fpu tsc pae nx sse sse2 pni mmx</p_features> |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9971122 Interesting, linux 6.10 host running sse3 so there seems to be no correlation between boinc host and whether or not the sse3 image gets downloaded? |
©2024 CERN