Tasks v530.09 crashing

Author	Message
Ageless Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0	Message 23456 - Posted: 11 Oct 2011, 11:09:42 UTC - in response to Message 23455. Last modified: 11 Oct 2011, 11:10:05 UTC Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage. How'd you know? You have aborted all 530.10 work on all your computers without even testing one... Jord BOINC FAQ Service ID: 23456 · Reply Quote

trigggl Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0	Message 23459 - Posted: 11 Oct 2011, 17:27:25 UTC - in response to Message 23456. Last modified: 11 Oct 2011, 17:34:56 UTC Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage. How'd you know? You have aborted all 530.10 work on all your computers without even testing one... Because I've run plenty of 530.08. I do have one 530.10 which is almost finished after a CPU time of 81,000 sec. http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=323191 My wing man is done at 36,000. Guess which OS. My CPU is at 2.6 GHz. Theirs shows 2.67. Looks like a great disadvantage to me. ID: 23459 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 23460 - Posted: 11 Oct 2011, 17:38:14 UTC - in response to Message 23456. Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage. How'd you know? You have aborted all 530.10 work on all your computers without even testing one... WU 326840. Same CPU, both with HT on, the faster one even has less L3 cache and it runs Windows. If I understand Eric/Igor correctly, 530.10 is just 530.08 with a new version number? I think it was firmly established about a week ago that 530.08 is faster on Windows than Linux by a factor of 2 or more. Not that I'm complaining. They'll get it fixed eventually and I'll continue to take tasks on my Linux box even now while Linux is at a disadvantage. ID: 23460 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 23462 - Posted: 11 Oct 2011, 20:06:51 UTC - in response to Message 23460. indeed 530.8 and 530.10 are identical. I do not believe any significant difference between Linux and Windows, but I'll test again............. I'm always prepared to learn or admit a mistake. Eric ID: 23462 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 23463 - Posted: 11 Oct 2011, 23:34:22 UTC - in response to Message 23462. Thanks Eric. IMHO, 530.08 was definitely slower on Linux by a factor of 2 or more. With 530.09 Linux and Windows were very close. Now 530.10 is showing a big difference again. I have a dual-boot system. I'll try to save a task and run it outside of BOINC on both Windows and Linux. ID: 23463 · Reply Quote

Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 23467 - Posted: 12 Oct 2011, 5:14:49 UTC - in response to Message 23463. IMHO, 530.08 was definitely slower on Linux by a factor of 2 or more. With 530.09 Linux and Windows were very close. Now 530.10 is showing a big difference again. Your tasks list for your Linux host seems to show that quite clearly seeing as you have a 2X disparity in maximum run time and a large enough sample size of tasks for both versions to justify the conclusion. The 530.9 version is readily available in the download directory and you could easily run it under AP (anonymous platform) if you wanted to regain your higher throughput. I've just started running classic on a couple of Windows hosts and I'm considering loading it onto some Linux hosts as well. If I do, I'll certainly use AP until the problem is solved. @Eric, the Devs at Einstein@Home support different apps for different CPU capabilities quite transparently. A host joining up will be sent a version of the app that is matched to its CPU capabilities. If you ask them what they did, it may allow you to solve your problem. Cheers, Gary. ID: 23467 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 23470 - Posted: 12 Oct 2011, 6:51:43 UTC - in response to Message 23467. Thanks for the suggestion, Gary. I'll probably do that if they don't get it sorted soon. I think they will though. ID: 23470 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 23471 - Posted: 12 Oct 2011, 7:24:08 UTC - in response to Message 23470. Thanks for this helpful info. Yes we will get it sorted but it may take a few days. Thanks. Eric. ID: 23471 · Reply Quote

Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 23472 - Posted: 12 Oct 2011, 8:20:49 UTC Last modified: 12 Oct 2011, 8:47:47 UTC Out of interest, I've just grabbed the 530.9 linux app from the download directory and created an app_info.xml file to run it under AP on a linux laptop that is running E@H tasks at the moment. It's a 2.6GHz core 2 Duo so it should have a reasonable turnaround time. I've just launched it and it has downloaded two tasks. With 50/50 resource shares, one core should be doing LHC tasks all the time. I'll see how it goes over the next couple of days. At 2.6GHz and the older architecture, there's no way it should be able to keep up with your Sandy Bridge. So, if my max crunch time is significantly less than the 55K secs you are getting, the benefits of running 530.9 for the appropriate CPU capabilities will be proved. EDIT: This is the tasks list if you want to follow its progress. Cheers, Gary. ID: 23472 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 23473 - Posted: 12 Oct 2011, 11:41:47 UTC Last modified: 12 Oct 2011, 11:42:15 UTC My Opteron 1210 at 1.8 GHz has just completed a task with 530.10 in 44k s CPU time. Another task with 530.08 was completed in 106k s. The next task has started again in high priority mode but now I know it will go normal after a few hours. All this on SuSE Linux 11.1. Tullio ID: 23473 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 23475 - Posted: 12 Oct 2011, 12:37:45 UTC Thanks for drawing my attention to this performance issue. I have made a very quick check because I am under great time pressure right now and my priorities are functionality (does it work), reliability (work all the time), and preformance (do it more efficiently). I think I got my priorities wrong in practice! More haste less speed. I have 18 test cases.On re-checking a couple of beam-beam, the most computationally demanding, I find the Linux 530.9 is 4 times faster than 530.8. HOWEVER the Windows 530.9 is actually SLOWER or FASTER by roughly 30%. I am supposed to have used the same IFORT options on both Windows and Linux. I strongly suspect I have messed up as there are architectural dependencies on the machine used for compilation. I will really sort this out as fast as I can but please be patient. The -arch -Qx options are complicated to say the least and I need to double check everything with Igor. Rest assured all results are equally valuable and appear to be bit-for-bit identical with these different versions. I must check the Physics though before we gotoo far down this path and find I have to implement my own decimal-binary conversion. Eric. ID: 23475 · Reply Quote

jujube Send message Joined: 25 Jan 11 Posts: 179 Credit: 83,858 RAC: 0	Message 23484 - Posted: 12 Oct 2011, 22:12:00 UTC - in response to Message 23475. Take your time, Eric. When I said "fixed soon" I meant anytime before Christmas. ID: 23484 · Reply Quote

Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 23488 - Posted: 13 Oct 2011, 6:34:03 UTC - in response to Message 23472. EDIT: This is the tasks list if you want to follow its progress. That host has been crunching for close to a day now and has already returned several tasks, a couple of which have already validated. The two longest running tasks took just over 7 hours (~26Ksecs) and there are two more partly completed that, at the current rate of progress (eg 50% complete in 3.5 hrs and 15% complete in 1 hr), will also take about the same. It's way too early to reach final conclusions but it does seem likely that 26Ksecs might turn out to be the time taken for 'full running' tasks on this host. If so, I'm sure glad that host is not running version 530.10 :-). There's one annoying downside to this which I should mention before people complain about it. You can understand the problem if you take a look at this particular quorum, which is the validated 26Ksecs task from my host and the task of the wingman which just happens to be an I7-2600K (and perhaps overclocked as well, although it does have HT enabled). Notice that the wingman task took around 30 mins longer to crunch and the OS was Windows 7. I wouldn't think that my host should be able to compare this favourably with an I7-2600K (even with the HT impedimant) unless the 530.10 app is not using the full CPU capabilities. Because the two finishing times are close enough, the claims for credit aren't too disparate. Imagine what would have happened if the wingman host had been running Linux and had taken twice as long. Its claim would have been doubled (close to 300) and, when averaged with my 122 claim, would have resulted in an award of around 200 or so. I'd be laughing but my wingman would be crying :-). So the problem is that all Linux hosts running the 530.10 app will be severely penalised credit-wise if they happen to be matched up with either a Windows wingman or one running the Linux 530.9 app (like my host). I'm sorry that Linux hosts will be affected by this but I think shorter running times are much more important so I'll continue (unless the Admins think otherwise) to use the 530.9 app for as long as it takes to fix the problem. Those participants with Linux hosts with the correct CPU capabilities (SSE2 and above, it would seem) could consider running the 530.9 app under AP, particularly if the Admins approve and perhaps publish the proper instructions. To assist with this, I've actually sent some details about this to Keith. Cheers, Gary. ID: 23488 · Reply Quote

LHC@home