Message boards : Number crunching : Tasks v530.09 crashing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23456 - Posted: 11 Oct 2011, 11:09:42 UTC - in response to Message 23455.  
Last modified: 11 Oct 2011, 11:10:05 UTC

Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage.

How'd you know? You have aborted all 530.10 work on all your computers without even testing one...
Jord

BOINC FAQ Service
ID: 23456 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23459 - Posted: 11 Oct 2011, 17:27:25 UTC - in response to Message 23456.  
Last modified: 11 Oct 2011, 17:34:56 UTC

Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage.

How'd you know? You have aborted all 530.10 work on all your computers without even testing one...

Because I've run plenty of 530.08. I do have one 530.10 which is almost finished after a CPU time of 81,000 sec.

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=323191

My wing man is done at 36,000. Guess which OS. My CPU is at 2.6 GHz. Theirs shows 2.67.

Looks like a great disadvantage to me.
ID: 23459 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23460 - Posted: 11 Oct 2011, 17:38:14 UTC - in response to Message 23456.  

Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage.

How'd you know? You have aborted all 530.10 work on all your computers without even testing one...


WU 326840. Same CPU, both with HT on, the faster one even has less L3 cache and it runs Windows.

If I understand Eric/Igor correctly, 530.10 is just 530.08 with a new version number? I think it was firmly established about a week ago that 530.08 is faster on Windows than Linux by a factor of 2 or more. Not that I'm complaining. They'll get it fixed eventually and I'll continue to take tasks on my Linux box even now while Linux is at a disadvantage.
ID: 23460 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23462 - Posted: 11 Oct 2011, 20:06:51 UTC - in response to Message 23460.  

indeed 530.8 and 530.10 are identical.
I do not believe any significant difference between
Linux and Windows, but I'll test again.............
I'm always prepared to learn or admit a mistake.

Eric
ID: 23462 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23463 - Posted: 11 Oct 2011, 23:34:22 UTC - in response to Message 23462.  

Thanks Eric.

IMHO, 530.08 was definitely slower on Linux by a factor of 2 or more. With 530.09 Linux and Windows were very close. Now 530.10 is showing a big difference again.

I have a dual-boot system. I'll try to save a task and run it outside of BOINC on both Windows and Linux.
ID: 23463 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 23467 - Posted: 12 Oct 2011, 5:14:49 UTC - in response to Message 23463.  

IMHO, 530.08 was definitely slower on Linux by a factor of 2 or more. With 530.09 Linux and Windows were very close. Now 530.10 is showing a big difference again.

Your tasks list for your Linux host seems to show that quite clearly seeing as you have a 2X disparity in maximum run time and a large enough sample size of tasks for both versions to justify the conclusion. The 530.9 version is readily available in the download directory and you could easily run it under AP (anonymous platform) if you wanted to regain your higher throughput. I've just started running classic on a couple of Windows hosts and I'm considering loading it onto some Linux hosts as well. If I do, I'll certainly use AP until the problem is solved.

@Eric, the Devs at Einstein@Home support different apps for different CPU capabilities quite transparently. A host joining up will be sent a version of the app that is matched to its CPU capabilities. If you ask them what they did, it may allow you to solve your problem.

Cheers,
Gary.
ID: 23467 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23470 - Posted: 12 Oct 2011, 6:51:43 UTC - in response to Message 23467.  

Thanks for the suggestion, Gary. I'll probably do that if they don't get it sorted soon. I think they will though.
ID: 23470 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23471 - Posted: 12 Oct 2011, 7:24:08 UTC - in response to Message 23470.  

Thanks for this helpful info. Yes we will get it
sorted but it may take a few days. Thanks. Eric.
ID: 23471 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 23472 - Posted: 12 Oct 2011, 8:20:49 UTC
Last modified: 12 Oct 2011, 8:47:47 UTC

Out of interest, I've just grabbed the 530.9 linux app from the download directory and created an app_info.xml file to run it under AP on a linux laptop that is running E@H tasks at the moment. It's a 2.6GHz core 2 Duo so it should have a reasonable turnaround time. I've just launched it and it has downloaded two tasks. With 50/50 resource shares, one core should be doing LHC tasks all the time. I'll see how it goes over the next couple of days.

At 2.6GHz and the older architecture, there's no way it should be able to keep up with your Sandy Bridge. So, if my max crunch time is significantly less than the 55K secs you are getting, the benefits of running 530.9 for the appropriate CPU capabilities will be proved.

EDIT: This is the tasks list if you want to follow its progress.
Cheers,
Gary.
ID: 23472 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 23473 - Posted: 12 Oct 2011, 11:41:47 UTC
Last modified: 12 Oct 2011, 11:42:15 UTC

My Opteron 1210 at 1.8 GHz has just completed a task with 530.10 in 44k s CPU time. Another task with 530.08 was completed in 106k s. The next task has started again in high priority mode but now I know it will go normal after a few hours. All this on SuSE Linux 11.1.
Tullio
ID: 23473 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 23475 - Posted: 12 Oct 2011, 12:37:45 UTC

Thanks for drawing my attention to this performance issue.
I have made a very quick check because I am under great
time pressure right now and my priorities are functionality (does it work),
reliability (work all the time), and preformance (do it more efficiently).
I think I got my priorities wrong in practice! More haste less speed.

I have 18 test cases.On re-checking a couple of beam-beam,
the most computationally demanding,
I find the Linux 530.9 is 4 times faster than 530.8.
HOWEVER the Windows 530.9 is actually SLOWER or FASTER
by roughly 30%. I am supposed to have used the same IFORT
options on both Windows and Linux. I strongly suspect I have messed
up as there are architectural dependencies on the machine used
for compilation. I will really sort this out as fast as I can but please
be patient. The -arch -Qx options are complicated to say the least and I need
to double check everything with Igor.

Rest assured all results are equally valuable and appear to be
bit-for-bit identical with these different versions.
I must check the Physics though before we gotoo far down this path
and find I have to implement my own decimal-binary conversion.
Eric.
ID: 23475 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23484 - Posted: 12 Oct 2011, 22:12:00 UTC - in response to Message 23475.  

Take your time, Eric. When I said "fixed soon" I meant anytime before Christmas.
ID: 23484 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 23488 - Posted: 13 Oct 2011, 6:34:03 UTC - in response to Message 23472.  

EDIT: This is the tasks list if you want to follow its progress.

That host has been crunching for close to a day now and has already returned several tasks, a couple of which have already validated. The two longest running tasks took just over 7 hours (~26Ksecs) and there are two more partly completed that, at the current rate of progress (eg 50% complete in 3.5 hrs and 15% complete in 1 hr), will also take about the same. It's way too early to reach final conclusions but it does seem likely that 26Ksecs might turn out to be the time taken for 'full running' tasks on this host. If so, I'm sure glad that host is not running version 530.10 :-).

There's one annoying downside to this which I should mention before people complain about it. You can understand the problem if you take a look at this particular quorum, which is the validated 26Ksecs task from my host and the task of the wingman which just happens to be an I7-2600K (and perhaps overclocked as well, although it does have HT enabled). Notice that the wingman task took around 30 mins longer to crunch and the OS was Windows 7. I wouldn't think that my host should be able to compare this favourably with an I7-2600K (even with the HT impedimant) unless the 530.10 app is not using the full CPU capabilities. Because the two finishing times are close enough, the claims for credit aren't too disparate. Imagine what would have happened if the wingman host had been running Linux and had taken twice as long. Its claim would have been doubled (close to 300) and, when averaged with my 122 claim, would have resulted in an award of around 200 or so. I'd be laughing but my wingman would be crying :-).

So the problem is that all Linux hosts running the 530.10 app will be severely penalised credit-wise if they happen to be matched up with either a Windows wingman or one running the Linux 530.9 app (like my host). I'm sorry that Linux hosts will be affected by this but I think shorter running times are much more important so I'll continue (unless the Admins think otherwise) to use the 530.9 app for as long as it takes to fix the problem. Those participants with Linux hosts with the correct CPU capabilities (SSE2 and above, it would seem) could consider running the 530.9 app under AP, particularly if the Admins approve and perhaps publish the proper instructions. To assist with this, I've actually sent some details about this to Keith.

Cheers,
Gary.
ID: 23488 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Tasks v530.09 crashing


©2024 CERN