Message boards : Number crunching : Linux vs. Windows app
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23343 - Posted: 4 Oct 2011, 13:47:27 UTC
Last modified: 4 Oct 2011, 13:48:22 UTC

Is it just me, or does it seem like the Linux app is way less efficient than the Windows app? There are slower windows computers finishing way faster than my Gentoo Linux systems. It's possible that there's overclocking involved, but I don't think that's the reason. Was the Linux 64-bit app compiled with sse2(or sse3) support?
ID: 23343 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23344 - Posted: 4 Oct 2011, 16:39:15 UTC - in response to Message 23343.  

Here is an example of where the Windows app appears to be more than twice as fast as the Linux app. Both results are from exactly the same CPU, both with HT turned on. It could be the Linux machine was running Sixtrack on all 8 cores when this task was crunched which can be detrimental to crunch time whereas the Windows machine was not running multiple Sixtrack tasks simultaneously.
ID: 23344 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23345 - Posted: 4 Oct 2011, 17:40:11 UTC

Why specifically Linux version of sixtrack might be slower then the Windows
version needs to be investigated. Before we do that, I have loaded inherently
faster version of sixtrack (version 503.9) into the system which uses SSE3.

This has to be monitored for a while, since we have also changed the validator
to accomodate results from different sixtrack versions. System should be stable.

With version 503.9 we are also interested in comparison between linux and windows.
Before, the numerical stability was major concern. Sixtrack is very sensitive
to erroneous results. Particles turning around accelerator structure are very
close to chaotic behaviour. Single bit differences somewhere lead to exponential
deviations over 1 million turns.

We have found machines out there, which produce wrong results 50-100 times more
frequently then the average. We would like to offer an explanation for this
artifact and will suggest a way to investigate this further.
skype id: igor-zacharov
ID: 23345 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23346 - Posted: 4 Oct 2011, 20:00:41 UTC - in response to Message 23345.  

We have found machines out there, which produce wrong results 50-100 times more frequently then the average. We would like to offer an explanation for this
artifact and will suggest a way to investigate this further.


CPDN project had a problem with some machines crashing tasks continually for months. They stopped sending tasks to those machines and sent an email (not a PM) to the owner asking them to post a request for help in a specific forum thread just for banned machines to get advice for correcting their problem. When the owner posts back to the forum that they have taken the prescribed corrective action the admins allow more work for that machine. If it continues to crash tasks then it gets banned again until other corrective measures are taken. This procedure has worked very well for CPDN. It has increased their production and has reduced the number of WUs that get dropped because they have too many errors. Maybe Sixtrack project could do something similar with "bad" machines.
ID: 23346 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23349 - Posted: 4 Oct 2011, 21:39:40 UTC

Well, I'm trying a few with the new app on my E2160. The one it's currently crunching is about half way through at roughly 3 hours, not that that's any indication until compared to the wing man.

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9931096

For reference, it's running at 3 GHz.
ID: 23349 · Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 5 Oct 08
Posts: 12
Credit: 1,108,455
RAC: 0
Message 23359 - Posted: 5 Oct 2011, 11:30:35 UTC - in response to Message 23343.  

I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries?

Alexander
ID: 23359 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23364 - Posted: 5 Oct 2011, 16:29:49 UTC - in response to Message 23359.  

I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries?


64-bit kubuntu? I haven't really seen (or noticed) that with 64-bit Gentoo. Of course, I build everything with every feature each of my CPU's support. That probably won't affect the speed of a static app, though.

Waiting on my Windows wing men to get a quasi-comparison.
ID: 23364 · Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 5 Oct 08
Posts: 12
Credit: 1,108,455
RAC: 0
Message 23367 - Posted: 5 Oct 2011, 20:57:21 UTC - in response to Message 23364.  

I have triet out different projects with Linux (Kubuntu) and Windows (7 x64). The only project I found running tasks quicker on Linux was DNA@Home ( Linux is here more than twice as fast ). Maybe its a basic problem with the compilers? Or more likely with the libraries?


64-bit kubuntu? I haven't really seen (or noticed) that with 64-bit Gentoo. Of course, I build everything with every feature each of my CPU's support. That probably won't affect the speed of a static app, though.

Waiting on my Windows wing men to get a quasi-comparison.

No, Kubuntu is 32bit, Win is 64.
I have tried Einstein, Spinhendge, DNA and two or three more. I returned to win, its ~20% faster on most projects.
It's an AMD Phenom X4 overclocked @3.7GHz
ID: 23367 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23368 - Posted: 5 Oct 2011, 21:16:38 UTC
Last modified: 5 Oct 2011, 21:23:10 UTC

Here's a some-what close comparison between my Gentoo machine and a windows machine with similar processors.

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=248188

I don't know if this quad is overclocked, but it's probably a good comparison either way. My processor is at 3GHz; the Windows 7 quad is 2.4GHz stock. Mine is about 20% slower based on CPU time.

The difference between my run time and CPU time may be because of the 9800 cuda crunching pps sieve using some of the run time.

EDIT: I guess, looking at the claimed floating point speeds, my task looks like it took around the amount of time it should have. So, it does look like the new app version is much improved.
ID: 23368 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23370 - Posted: 6 Oct 2011, 0:12:43 UTC - in response to Message 23368.  

I agree. The new app seems to be running faster than the previous version on Linux. (or else it's running slower on Windows).

WU 250073 was crunched on the same 2 processors, an i7 - 2600 @3.40 GHz, HT on, same RAM, very similar GPU. The Win7 machine was a bit slower and the reason may be that it has smaller L3 cache.

I've found other examples where the Linux machine was a bit slower than the Win machine and those cases might be because of HT and/or cache. I certainly don't see Windows being 2X faster than Linux as was the case with the previous version.

Good job, Eric and Igor.
ID: 23370 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 23949 - Posted: 6 Jun 2012, 16:14:44 UTC

Similar experience here, single core AMD Athlon 64 running 3300Mhz Ubuntu 10.04 x64 takes several hours to run tasks, similar tasks take about 20-40 mins on less powerful mobile 3000Mhz version running Win 7x32. Not sure if it's the math libs or what?
ID: 23949 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 108
Credit: 661,871
RAC: 196
Message 23954 - Posted: 7 Jun 2012, 0:59:44 UTC

Same here with 64 bit Linux (Fedora 16) being up to 3 times slower than 32 bit Windows and 64 bit Windows of any version.
Over 9,000 seconds on Linux and just over 3,000 seconds on Windows, odd to say the least, especially on similar or the same CPUs.

Conan
ID: 23954 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23959 - Posted: 8 Jun 2012, 14:01:47 UTC

Thankyou for your most valuable feedback.
The current Windows and Linux executables are compiled with options to produce
two sets of code; one set for machines with
SSE2 (or better I hope) and one set for older systems.
Executables are staically linked and pretty much completely independenf of
libraries, but do make system calls for input/output.
This seems to be working fine on Windows as I have tested here at CERN on an
AMD Athlon, but I am now going to check again.
I do not have an Athlon with Linux...the Intel ifort documentation does say
that these options are NOT supported on non-Intel hardware!
So, I need more feedback, but I am already planning to produce a Linux
executable that requires SSE2 or better and I hope Igor can arrange for BOINC
to pick this one, or the present one, depending on the type of processor.

So, any more feedback welcome. If I know the name of the case/workunit,
MHz/GHz of the processor and the type/make of PC, I can compare execution
times for that case on different machines here, and verify if I have really
indentified the problem. I can be contacted directly at eric.mcintosh@cern.ch.

(I have been so busy with the numeric compatibilty, the Intel ifort compiler,
the run environment and documentation that I had postponed study of
performance until now. My priorities have always been, Functionality,
Reliability, Performance and in that order :-).

...and All versions produce identical 0 ULP difference results irrespective
of SSe2 etc or not.
ID: 23959 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 23963 - Posted: 8 Jun 2012, 22:07:49 UTC - in response to Message 23959.  

for comparison

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9954746 windows laptop Athlon 64 mobile 3000Mhz 32-bit runs fairly well for an old machine

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=9690657 linux desktop Athlon 64 3300Mhz 64-bit
runs considerably slower
ID: 23963 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 23965 - Posted: 9 Jun 2012, 16:25:37 UTC - in response to Message 23959.  
Last modified: 9 Jun 2012, 16:26:41 UTC

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=1591554 latest WU took about 6x that of my wingman on a relatively equivalent CPU C2D e4400 vs Athlon 64 3300+ (ps cpu should be SSE2/3 capable)
ID: 23965 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23966 - Posted: 10 Jun 2012, 6:43:35 UTC

Thanks for the feedback; looks like it is really a
problem with AMD Athlon on Linux, but is OK for Windows!
I guess Intel don't particularly want to be optimal
on non-Intel hardware or their techbique doesn't recognise
SSE2, but OK for Windows....ah well.

I have built a new alternative version which will use SSE2
and I hope Igor will use that, but the previous version for
old systems. I am assuming Windows is OK.

I shall be watching this closely. Keep me posted. Eric.
ID: 23966 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23967 - Posted: 10 Jun 2012, 15:28:50 UTC

From: Eric Mcintosh
Sent: Sunday, June 10, 2012 5:23 PM
To: Igor Zacharov
Cc: Eric Mcintosh; Massimo Giovannozzi; Laurent Deniau; Frank Schmidt;
Riccardo De Maria; project-lhcathome-it (LHCathome Platform - IT et al)
Subject: Performance, SSE2 and Intel ifort


I ran a case 100 times on lxbatch.
Answers speak for themselves:

...........
lxbsu1306:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s)
9.42 second(s)
lxbsu1312:model name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For
53018 Turn(s) 9.40 second(s)
lxbsu1317:model name : Intel(R) Xeon(R) CPU L5520
@ 2.27GHz For 53018 Turn(s) 9.13 second(s)
lxbsu1535:model name : Intel(R)
Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 8.75 second(s)
lxbsu1537:model
name : Intel(R) Xeon(R) CPU L5520 @ 2.27GHz For 53018 Turn(s) 11.4 second(s)
lxbsu2008:model name : AMD Opteron(tm) Processor 6164 HE For 53018 Turn(s)
56.2 second(s) lxbsu2011:model name : AMD Opteron(tm) Processor 6164 HE For
53018 Turn(s) 56.0 second(s)
lxbsu2013:model name : AMD Opteron(tm) Processor
6164 HE For 53018 Turn(s) 56.3 second(s)
lxbsu2111:model name : AMD
Opteron(tm) Processor 6164 HE For 53018 Turn(s) 55.9 second(s) ...............

Clearly Intel ifort does NOT use SSE2 on AMD Athlon. No comment and it is
documented.
I re-checked Windows...........The AMD is over two times slower............
but the MHz are 2/3 of the Intel....................
I am now going to try and produce a FORCED SSE2 on Windows AMD.

PCBE13896
Also on cygwin
model name : Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz
cpu MHz : 3093
[lxplus313] ~/sixdesk/try $ grep 'Turn(s)'
lost/fort.6.SixTrack_4438_crlibm_bnl_ifort_boinc_api_O2_PCBE13896
For 53018 Turn(s) 6.74 second(s) of Computing Time
was needed

abpc11974
CYGWIN_NT-5.1 abpc11974 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin
vendor_id : AuthenticAMD
model name : AMD Phenom(tm) 9600B Quad-Core Processor
cpu MHz : 2294

[lxplus313] ~/sixdesk/try $ grep 'Turn(s)'
lost/fort.6.SixTrack_4438_crlibm_bnl_ifort_boinc_api_O2_abpc11974
For 53018 Turn(s) 15.8 second(s) of Computing Time
was needed

Yours in haste Spain Italy coming up. Eric.

P.S. Looks like I shall have to re-study the Windows situation too...........I
must have a 32-bit executable (for the time being).

ID: 23967 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23968 - Posted: 11 Jun 2012, 6:13:34 UTC

Just to confirm that the new executable fixes problem with non-Intel
Old and new results are:

lxbsu2403.cern.ch
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6164 HE
stepping : 1
cpu MHz : 800.000
cache size : 512 KB
physical id : 1
siblings : 12
SixTrack_4437_crlibm_bnl_ifort_boinc_api_O2

SIXTRACR VECTOR VERSION 4.4.37 (with tilt) -- (last change: 27.05.2012)


SIXTRACR starts on: 10th of June 2012, 25 minutes after 16.

-----------------------------------------------------------------------------------------------------------------------------------

OOOOOOOOOOOOOOOOOOOOOO
OO OO

For 53018 Turn(s) 56.2 second(s) of Computing Time was needed

-----------------------------------------------------------------------------------------------------------------------------------


Total Time used: 60.0 second(s)

-----------------------------------------------------------------------------------------------------------------------------------
SIXTRACR stop

lxbsu2303.cern.ch
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6164 HE
stepping : 1
cpu MHz : 800.000
cache size : 512 KB
physical id : 1
siblings : 12
SixTrack_4437_crlibm_bnl_ifort_boinc_api_SSE2

SIXTRACR VECTOR VERSION 4.4.37 (with tilt) -- (last change: 27.05.2012)


SIXTRACR starts on: 11th of June 2012, 23 minutes after 07.

-----------------------------------------------------------------------------------------------------------------------------------

OOOOOOOOOOOOOOOOOOOOOO
OO OO

For 53018 Turn(s) 12.5 second(s) of Computing Time was needed

-----------------------------------------------------------------------------------------------------------------------------------


Total Time used: 15.5 second(s)

-----------------------------------------------------------------------------------------------------------------------------------
SIXTRACR stop


So with the new executable the AMD is 1/2 Intel speed, but has only 1/3 MHz.
Looks good; I shall now try and do the same for Windows. Sigh..

Clearly we should use the new SSE2 executables (and the previous for "old" systems).

Eric.



ID: 23968 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23970 - Posted: 11 Jun 2012, 12:50:03 UTC

Well the plot thickens....have a look at the Web Page cited
(for example).
There is lots more on the Web. Maybe we have to live with this
for the time being. I am working full time on it.
So far I have generated a Linux executable, slow on AMD,
and another which crashes imemdiatley on AMD.
Trying my best.
Eric.
http://www.swallowtail.org/naughty-intel.shtml
ID: 23970 · Report as offensive     Reply Quote
angler

Send message
Joined: 25 Nov 06
Posts: 25
Credit: 4,686,113
RAC: 0
Message 23972 - Posted: 11 Jun 2012, 22:05:46 UTC - in response to Message 23970.  
Last modified: 11 Jun 2012, 22:28:38 UTC

appreciate the update, hopefully a fix can be found - it would make more efficient use of all those cycles on AMD linux boxes

btw one more WU

Linux intel appears 4.5x faster than AMD on processors with
pretty close FP and integer ratings, (some of this may be due to other processes and CPU throttling on my single core AMD) but not most.

http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=3359826
AMD cpu

http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=3359827
Intel cpu
ID: 23972 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Linux vs. Windows app


©2024 CERN