Thread 'How is the SSE3 thing coming along?'

Author	Message
Igor Zacharov Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0	Message 24347 - Posted: 14 Jul 2012, 0:46:04 UTC The uptake of the optimized executables (sseX, etc.) is understood and quite stable now. Unfortunately, we had to retract darwin exec, because it was crashing when linked with the latest boinc zip library. Apple will take more time to understand and to correct. Concerning the timeout, apparently there was a much too optimistic factor estimating performance of the sseX, which lead to incorrect projection for the time to run the program. This has been hopefully corrected with the latest recompile of the boinc scheduler at about 7 pm CET on friday. We will monitor this, but at least the tests with our machines tonight show an improvement. Please, report if you see errors your end. The relative performance of sixtrack runs can be compared only with identical input, since simulated particles can hit the wall and terminate the program before they reache 1M turns. Igor. skype id: igor-zacharov ID: 24347 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24357 - Posted: 14 Jul 2012, 11:11:40 UTC Last modified: 14 Jul 2012, 11:53:13 UTC I am running a 444.01 task on my Opteron 1210, Linux OS. For the first time I see it marked as sse2, although my CPU is pni enabled. It should be faster than the preceding unit, which was not marked sse2. Tullio 13 ks against 33 ks. I now got a pni unit. ID: 24357 · Reply Quote

angler Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0	Message 24359 - Posted: 14 Jul 2012, 14:07:28 UTC - in response to Message 24344. PNI/SSE3 is working for my AMD processors. your SSE2 may have been short WU's to begin with, trick is to compare to your wingman. ID: 24359 · Reply Quote

Cash Send message Joined: 13 Jul 05 Posts: 21 Credit: 456,769 RAC: 0	Message 24360 - Posted: 14 Jul 2012, 14:23:34 UTC pni and sse3 are binary identical using fc (file compare) on windows however, my host supporting pni gets both sse2/pni with majority work units being sse2 (up to 90%). I believe, Igor should tweak WUs distribution. I have not got a single computation error on pni@AMD. this game has no name ID: 24360 · Reply Quote

angler Send message Joined: 25 Nov 06 Posts: 25 Credit: 4,686,113 RAC: 0	Message 24362 - Posted: 14 Jul 2012, 17:25:41 UTC - in response to Message 24360. the computation errors seems to occur more with Intel hosts, probably from the estimated cycles to complete. but those hosts probably benefited less from the latest compiles, in fact my Intel hosts show little difference between pni and gen images in terms of execution times ID: 24362 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24364 - Posted: 14 Jul 2012, 18:57:37 UTC On my Linux box, Opteron 1210 at 1.8 GHz: gen 33 ks, sse2 13 ks, pni 6 ks Tullio ID: 24364 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24374 - Posted: 16 Jul 2012, 10:21:16 UTC I am seeing an extreme variability of the credits/hour ratio. In time order: sse2 74.53 pni 34.08 pni 25.48 sse2 5.81 All those in run time. ID: 24374 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24376 - Posted: 16 Jul 2012, 14:28:44 UTC What does this mean? It happens every hour: 16-Jul-2012 16:02:24 [LHC@home 1.0] Sending scheduler request: Requested by project. 16-Jul-2012 16:02:24 [LHC@home 1.0] Not reporting or requesting tasks 16-Jul-2012 16:02:26 [LHC@home 1.0] Scheduler request completed ID: 24376 · Reply Quote

Desti Send message Joined: 16 Jul 05 Posts: 84 Credit: 1,875,851 RAC: 0	Message 24380 - Posted: 16 Jul 2012, 17:27:14 UTC All my new workunits are valid so far, on AMD linux. http://lhcathomeclassic.cern.ch/sixtrack/results.php?userid=7562 Linux Users Everywhere @ BOINC [url=http://lhcathome.cern.ch/team_display.php?teamid=717] ID: 24380 · Reply Quote

Tex1954 Send message Joined: 24 Apr 11 Posts: 37 Credit: 1,295,012 RAC: 0	Message 24383 - Posted: 16 Jul 2012, 18:18:17 UTC - in response to Message 24380. Last modified: 16 Jul 2012, 18:20:07 UTC I just got a load of those ZERO length tasks... and this time it crashed the system somehow. Just in case, lowered the speed (1100T box) to run cooler. Also checked the data directory and it appears there are more ZERO length files in the wings. Last time this was a disk full error I think. Anyways, they only run a couple seconds so no biggy. WOOPSY!!! :) ID: 24383 · Reply Quote

m Send message Joined: 6 Sep 08 Posts: 119 Credit: 14,947,999 RAC: 2,344	Message 24384 - Posted: 16 Jul 2012, 20:54:29 UTC - in response to Message 24376. tullio. What does this mean? It happens every hour: 16-Jul-2012 16:02:24 [LHC@home 1.0] Sending scheduler request: Requested by project. 16-Jul-2012 16:02:24 [LHC@home 1.0] Not reporting or requesting tasks 16-Jul-2012 16:02:26 [LHC@home 1.0] Scheduler request completed If you look here, under "Client control", the first item. Might this be what's happening? John. ID: 24384 · Reply Quote

Richard Haselgrove Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0	Message 24385 - Posted: 16 Jul 2012, 21:07:15 UTC - in response to Message 24384. What does this mean? It happens every hour: 16-Jul-2012 16:02:24 [LHC@home 1.0] Sending scheduler request: Requested by project. 16-Jul-2012 16:02:24 [LHC@home 1.0] Not reporting or requesting tasks 16-Jul-2012 16:02:26 [LHC@home 1.0] Scheduler request completed If you look here, under "Client control", the first item. Might this be what's happening? John. Yes, I see <next_rpc_delay>3600.000000</next_rpc_delay> in my sched_reply_lhcathomeclassic.cern.ch_sixtrack.xml file. ID: 24385 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 24386 - Posted: 17 Jul 2012, 6:06:44 UTC - in response to Message 24383. Sorry; finger trouble. I am about to post a news. Will resubmit with a hopefully correct disk space estimate. Only good thing is these tasks don't waste CPU! Eric. ID: 24386 · Reply Quote

Gary Roberts Send message Joined: 22 Jul 05 Posts: 72 Credit: 3,962,626 RAC: 0	Message 24393 - Posted: 18 Jul 2012, 1:57:55 UTC - in response to Message 24299. ... I've attached 15 hosts to the project, all of which are sse3 capable. They all received the generic executable and all were crunching very slowly. As soon as the versions appeared back in the download area - about 24 hrs ago - I grabbed copies of the sse3 versions... .... The speedup using the sse3 version is quite impressive! Hopefully, you will be able to come up with a reliable mechanism for detecting the CPU capabilities and the manual effort of managing AP can be dispensed with. I wrote the above earlier in this thread. Yesterday, I attached a new host - mainly to see if there had been any change in the ability of the project to detect host capabilities. I have done this a few times but it was a couple of days since I last tried it. When I added the new host, I didn't bother using the 'attach to project' mechanism in BOINC Manager - although that would have been by far the easiest method :-). I've been keen to support this project for quite a few years and in earlier days before the 'big drought', I had quite a few hosts (with low HostIDs) attached. When the project had very long periods of no work, these hosts were allocated solely to E@H and removed from LHC. The entries for them dropped out of sight in the LHC hosts list but are all still there. In the intervening period, a lot of these hosts were decommissioned and/or upgraded - Coppermine and Tualatin PIIIs became Core 2 quads and Athlon XPs became Phenom II quads. Their principal task is to crunch for E@H but I'd also like to allocate some time to LHC. It is relatively easy, if you understand the state file (client_state.xml) and you know how to edit it safely, to attach a new host to the project and have it adopt the identity of a much loved but long departed former host, rather than being allocated a new ID at the top of an already overly bloated range. So yesterday, I chose one of my old HostIDs which was last active back in 2007. I chose a Q8400 quad which was attached only to E@H and created an LHC project directory populated with the various versions of the current app. I placed an LHC account file in the BOINC directory and stopped BOINC and made the appropriate adjustments to the state file so that BOINC could use the old HostID. On restarting, I was expecting BOINC to select the generic version of the app (and that I would then have to switch to AP) but I was delighted to see it go for the SSE3 version and promptly download a number of new tasks. I am grateful to the Admins for fixing the CPU capability detection mechanism. Thanks for the good job done! I can now transition all my other hosts away from AP. For anyone interested, 45149 is the resurrected HostID of my latest addition to this project. For any other people running hosts under AP because of previous improper CPU capability detection, it now seems that you can drop AP. Be careful how you do this. It's very easy to trash all tasks currently in your cache. If you don't want to do extensive editing of client_state.xml, the safest method is to set NNT and allow all tasks to be completed and returned. Then delete app_info.xml, stop and restart BOINC and reset the project which will remove unwanted stuff from the state file. Then allow new tasks and the correct app and new tasks (if available) should be downloaded. Cheers, Gary. ID: 24393 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 65,813,929 RAC: 26,349	Message 24394 - Posted: 18 Jul 2012, 9:39:44 UTC The CPU capabilities recognition has been improved but is not fixed yet. On my four hosts (all sse3) I still get sse3, pni and sse2 applications (sent yesterday and today), only a few sse2 but anyway some. Also 64 bit environments have been getting both 32 bit and 64 bit versions, at least some time a go. All versions can be seen on the hosts application details page. There you can also evaluate each applications average processing rate to see which are the fastest applications. Note that the variations on WUs requires you to have at least 20 (maybe even more) WUs to complete on each application to even out the differences between WUs. ID: 24394 · Reply Quote

Richard Haselgrove Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0	Message 24396 - Posted: 18 Jul 2012, 15:17:24 UTC Indeed. Now we've got the basic framework set up and working, there's probably some fine tuning needed. I checked the code this morning, and it seems that hostinfo_unix.cpp returns sse3: hostinfo_win.cpp returns pni - so that's how the respective clients describe themselves. The plan_class test on the server just searches for "sse3" as a substring of that great long processor feature list: so, for the Windows platform, it's pure luck the the majority of modern Intel processors also report "ssse3" and match the substring test. The original P4 "Prescott" stepping CPUs, and all AMD processors, won't be accepted as sse3-capable under Windows. As Harri notes, LHCclassic is now running the version of BOINC server code generically described as 'CreditNew' - although it's not all that 'new' any longer (been around for over two years), and it affects a lot of other things, like runtime estimates, not just credit. Runtime estimations are based on the APR for each plan class, and this can cause major problems for a project like LHC. Although our runtimes are capped by the number of accelerator traverses being modelled, they are non-deterministic up to that limit, and there is no way the project can know in advance how many fpops a given workunit will use. Some WUs finish early (if they encounter a tunnel wall collision because of the parameter set), and if a newly-attached host happens to get a high proportion of these tunnel-colliding short WUs in the early allocations, APR can be set unrealistically high at the server, and runtimes estimated low. This can be a significant cause of the -177 errors (or, with the latest clients, 197 EXIT_TIME_LIMIT_EXCEEDED errors) when full-length WUs are encountered later. If you haven't done so already, I strongly urge you to use the "runtime outlier" mechanism described in changeset [trac]changeset:24225[/trac] - that needs to be done in the validator. At the very least, any task running for less than 10% of the expected number of orbits must be marked as an outlier - allow a bit of a safety margin above that as well. If outliers are excluded, it takes even longer for realistic runtime estimates to become effective, as they require 10 validated, non-outlier tasks for each plan class. It would be helpful if PNI and SSE3 could be amalgamated with an 'or' clause in the string test - since they both specify the same application, there will be no difference in the 'real' APR/runtime (any observed difference will be because of the variable runtimes of the randomly-assigned WUs). IIRC, the server will still occasionally 'probe' clients with an alternative application (hence SSE2 appearing), just to check which app is fastest on a particular host. Provided the outlier WUs are excluded from APR, that shouldn't cause any long-term problems. Apart from the variability of runtime estimation, the distortions in APR also contribute to an even greater degree of credit variability. The project's current applications meet the criteria for credit for nonuniform jobs, and you might consider that option. I think those are the main, applicable, tweaks to CreditNew that we've persuaded the devs to add since deployment. Many more are needed, especially in the area of boundary conditions (initial estimates) for new GPU apps, GPU app_versions, and GPU-equipped hosts. You will be luckier than most "new project" admins, in having access to people in close contact with the BOINC developers through Test4Theory. I happen to know that some other project admins - notably Eric Korpela at SETI, and Bernd Machenschalk at Einstein - are finding themselves having to re-examine this area of code with a close and critical eye. Perhaps while your (re-)learning curve is still fresh in your mind, you could supply some feedback to BOINC Central? It would be nice if there could be some project pressure to reinstate some of the smooth-running protective safeguards that used to exist in earlier versions of BOINC. ID: 24396 · Reply Quote

Igor Zacharov Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0	Message 24418 - Posted: 23 Jul 2012, 16:15:52 UTC - in response to Message 24396. Richard, thank you for the suggestions in your posting. We have now implemented the outliers and the credit based on runtime. I have also increased a little the max_wus to_send and in_progress. For the executables, the ppn and the sse3 binaries are exactly the same, while we believe the difference with sse2 in runtime is small. Therefore, we left it as is for now. Next change to the executables will be recompilation due to introduction of new physics and arrival of the mac version in a few weeks time. With that, we will analyze the working of the system. Thanks to all experts who helped to make the LHC project better. Igor. skype id: igor-zacharov ID: 24418 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 24446 - Posted: 26 Jul 2012, 16:28:03 UTC I see that my new AMD E-450 is seen as sse3. Let us see what it does. Tullio ID: 24446 · Reply Quote