Message boards :
Number crunching :
Linux vs. Windows app
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
Is it just me, or does it seem like the Linux app is way less efficient than the Windows app? There are slower windows computers finishing way faster than my Gentoo Linux systems. It's possible that there's overclocking involved, but I don't think that's the reason. Was the Linux 64-bit app compiled with sse2(or sse3) support? |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
Here is an example of where the Windows app appears to be more than twice as fast as the Linux app. Both results are from exactly the same CPU, both with HT turned on. It could be the Linux machine was running Sixtrack on all 8 cores when this task was crunched which can be detrimental to crunch time whereas the Windows machine was not running multiple Sixtrack tasks simultaneously. |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
Why specifically Linux version of sixtrack might be slower then the Windows version needs to be investigated. Before we do that, I have loaded inherently faster version of sixtrack (version 503.9) into the system which uses SSE3. This has to be monitored for a while, since we have also changed the validator to accomodate results from different sixtrack versions. System should be stable. With version 503.9 we are also interested in comparison between linux and windows. Before, the numerical stability was major concern. Sixtrack is very sensitive to erroneous results. Particles turning around accelerator structure are very close to chaotic behaviour. Single bit differences somewhere lead to exponential deviations over 1 million turns. We have found machines out there, which produce wrong results 50-100 times more frequently then the average. We would like to offer an explanation for this artifact and will suggest a way to investigate this further. skype id: igor-zacharov |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
We have found machines out there, which produce wrong results 50-100 times more frequently then the average. We would like to offer an explanation for this CPDN project had a problem with some machines crashing tasks continually for months. They stopped sending tasks to those machines and sent an email (not a PM) to the owner asking them to post a request for help in a specific forum thread just for banned machines to get advice for correcting their problem. When the owner posts back to the forum that they have taken the prescribed corrective action the admins allow more work for that machine. If it continues to crash tasks then it gets banned again until other corrective measures are taken. This procedure has worked very well for CPDN. It has increased their production and has reduced the number of WUs that get dropped because they have too many errors. Maybe Sixtrack project could do something similar with "bad" machines. |
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
Well, I'm trying a few with the new app on my E2160. The one it's currently crunching is about half way through at roughly 3 hours, not that that's any indication until compared to the wing man. http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9931096 For reference, it's running at 3 GHz. |
Send message Joined: 5 Oct 08 Posts: 12 Credit: 1,108,455 RAC: 0 |
I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries? Alexander |
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries? 64-bit kubuntu? I haven't really seen (or noticed) that with 64-bit Gentoo. Of course, I build everything with every feature each of my CPU's support. That probably won't affect the speed of a static app, though. Waiting on my Windows wing men to get a quasi-comparison. |
Send message Joined: 5 Oct 08 Posts: 12 Credit: 1,108,455 RAC: 0 |
I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries? No, Kubuntu is 32bit, Win is 64. I have tried Einstein, Spinhendge, DNA and two or three more. I returned to win, its ~20% faster on most projects. It's an AMD Phenom X4 overclocked @3.7GHz |
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
Here's a some-what close comparison between my Gentoo machine and a windows machine with similar processors. http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=248188 I don't know if this quad is overclocked, but it's probably a good comparison either way. My processor is at 3GHz; the Windows 7 quad is 2.4GHz stock. Mine is about 20% slower based on CPU time. The difference between my run time and CPU time may be because of the 9800 cuda crunching pps sieve using some of the run time. EDIT: I guess, looking at the claimed floating point speeds, my task looks like it took around the amount of time it should have. So, it does look like the new app version is much improved. |
Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0 |
I agree. The new app seems to be running faster than the previous version on Linux. (or else it's running slower on Windows). WU 250073 was crunched on the same 2 processors, an i7 - 2600 @3.40 GHz, HT on, same RAM, very similar GPU. The Win7 machine was a bit slower and the reason may be that it has smaller L3 cache. I've found other examples where the Linux machine was a bit slower than the Win machine and those cases might be because of HT and/or cache. I certainly don't see Windows being 2X faster than Linux as was the case with the previous version. Good job, Eric and Igor. |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
Similar experience here, single core AMD Athlon 64 running 3300Mhz Ubuntu 10.04 x64 takes several hours to run tasks, similar tasks take about 20-40 mins on less powerful mobile 3000Mhz version running Win 7x32. Not sure if it's the math libs or what? |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
Same here with 64 bit Linux (Fedora 16) being up to 3 times slower than 32 bit Windows and 64 bit Windows of any version. Over 9,000 seconds on Linux and just over 3,000 seconds on Windows, odd to say the least, especially on similar or the same CPUs. Conan |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thankyou for your most valuable feedback. The current Windows and Linux executables are compiled with options to produce two sets of code; one set for machines with SSE2 (or better I hope) and one set for older systems. Executables are staically linked and pretty much completely independenf of libraries, but do make system calls for input/output. This seems to be working fine on Windows as I have tested here at CERN on an AMD Athlon, but I am now going to check again. I do not have an Athlon with Linux...the Intel ifort documentation does say that these options are NOT supported on non-Intel hardware! So, I need more feedback, but I am already planning to produce a Linux executable that requires SSE2 or better and I hope Igor can arrange for BOINC to pick this one, or the present one, depending on the type of processor. So, any more feedback welcome. If I know the name of the case/workunit, MHz/GHz of the processor and the type/make of PC, I can compare execution times for that case on different machines here, and verify if I have really indentified the problem. I can be contacted directly at eric.mcintosh@cern.ch. (I have been so busy with the numeric compatibilty, the Intel ifort compiler, the run environment and documentation that I had postponed study of performance until now. My priorities have always been, Functionality, Reliability, Performance and in that order :-). ...and All versions produce identical 0 ULP difference results irrespective of SSe2 etc or not. |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
for comparison http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9954746 windows laptop Athlon 64 mobile 3000Mhz 32-bit runs fairly well for an old machine http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9690657 linux desktop Athlon 64 3300Mhz 64-bit runs considerably slower |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1591554 latest WU took about 6x that of my wingman on a relatively equivalent CPU C2D e4400 vs Athlon 64 3300+ (ps cpu should be SSE2/3 capable) |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks for the feedback; looks like it is really a problem with AMD Athlon on Linux, but is OK for Windows! I guess Intel don't particularly want to be optimal on non-Intel hardware or their techbique doesn't recognise SSE2, but OK for Windows....ah well. I have built a new alternative version which will use SSE2 and I hope Igor will use that, but the previous version for old systems. I am assuming Windows is OK. I shall be watching this closely. Keep me posted. Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
From: Eric Mcintosh Sent: Sunday, June 10, 2012 5:23 PM To: Igor Zacharov Cc: Eric Mcintosh; Massimo Giovannozzi; Laurent Deniau; Frank Schmidt; Riccardo De Maria; project-lhcathome-it (LHCathome Platform - IT et al) Subject: Performance, SSE2 and Intel ifort I ran a case 100 times on lxbatch. Answers speak for themselves: ........... lxbsu1306:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 9.42 second(s) lxbsu1312:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 9.40 second(s) lxbsu1317:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 9.13 second(s) lxbsu1535:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 8.75 second(s) lxbsu1537:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 11.4 second(s) lxbsu2008:model name : AMD Opteron(tm) Processor 6164 HE For 53018 Turn(s) 56.2 second(s) lxbsu2011:model name : AMD Opteron(tm) Processor 6164 HE For 53018 Turn(s) 56.0 second(s) lxbsu2013:model name : AMD Opteron(tm) Processor 6164 HE For 53018 Turn(s) 56.3 second(s) lxbsu2111:model name : AMD Opteron(tm) Processor 6164 HE For 53018 Turn(s) 55.9 second(s) ............... Clearly Intel ifort does NOT use SSE2 on AMD Athlon. No comment and it is documented. I re-checked Windows...........The AMD is over two times slower............ but the MHz are 2/3 of the Intel.................... I am now going to try and produce a FORCED SSE2 on Windows AMD. PCBE13896 Also on cygwin model name : Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz cpu MHz : 3093 [lxplus313] ~/sixdesk/try $ grep 'Turn(s)' lost/fort.6.SixTrack_4438_crlibm_bnl_ifort_boinc_api_O2_PCBE13896 For 53018 Turn(s) 6.74 second(s) of Computing Time was needed abpc11974 CYGWIN_NT-5.1 abpc11974 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin vendor_id : AuthenticAMD model name : AMD Phenom(tm) 9600B Quad-Core Processor cpu MHz : 2294 [lxplus313] ~/sixdesk/try $ grep 'Turn(s)' lost/fort.6.SixTrack_4438_crlibm_bnl_ifort_boinc_api_O2_abpc11974 For 53018 Turn(s) 15.8 second(s) of Computing Time was needed Yours in haste Spain Italy coming up. Eric. P.S. Looks like I shall have to re-study the Windows situation too...........I must have a 32-bit executable (for the time being). |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Just to confirm that the new executable fixes problem with non-Intel Old and new results are: lxbsu2403.cern.ch processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6164 HE stepping : 1 cpu MHz : 800.000 cache size : 512 KB physical id : 1 siblings : 12 SixTrack_4437_crlibm_bnl_ifort_boinc_api_O2 SIXTRACR VECTOR VERSION 4.4.37 (with tilt) -- (last change: 27.05.2012) SIXTRACR starts on: 10th of June 2012, 25 minutes after 16. ----------------------------------------------------------------------------------------------------------------------------------- OOOOOOOOOOOOOOOOOOOOOO OO OO For 53018 Turn(s) 56.2 second(s) of Computing Time was needed ----------------------------------------------------------------------------------------------------------------------------------- Total Time used: 60.0 second(s) ----------------------------------------------------------------------------------------------------------------------------------- SIXTRACR stop lxbsu2303.cern.ch processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6164 HE stepping : 1 cpu MHz : 800.000 cache size : 512 KB physical id : 1 siblings : 12 SixTrack_4437_crlibm_bnl_ifort_boinc_api_SSE2 SIXTRACR VECTOR VERSION 4.4.37 (with tilt) -- (last change: 27.05.2012) SIXTRACR starts on: 11th of June 2012, 23 minutes after 07. ----------------------------------------------------------------------------------------------------------------------------------- OOOOOOOOOOOOOOOOOOOOOO OO OO For 53018 Turn(s) 12.5 second(s) of Computing Time was needed ----------------------------------------------------------------------------------------------------------------------------------- Total Time used: 15.5 second(s) ----------------------------------------------------------------------------------------------------------------------------------- SIXTRACR stop So with the new executable the AMD is 1/2 Intel speed, but has only 1/3 MHz. Looks good; I shall now try and do the same for Windows. Sigh.. Clearly we should use the new SSE2 executables (and the previous for "old" systems). Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well the plot thickens....have a look at the Web Page cited (for example). There is lots more on the Web. Maybe we have to live with this for the time being. I am working full time on it. So far I have generated a Linux executable, slow on AMD, and another which crashes imemdiatley on AMD. Trying my best. Eric. http://www.swallowtail.org/naughty-intel.shtml |
Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0 |
appreciate the update, hopefully a fix can be found - it would make more efficient use of all those cycles on AMD linux boxes btw one more WU Linux intel appears 4.5x faster than AMD on processors with pretty close FP and integer ratings, (some of this may be due to other processes and CPU throttling on my single core AMD) but not most. http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=3359826 AMD cpu http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=3359827 Intel cpu |
©2024 CERN