Message boards :
Number crunching :
Wrong applications sent to my computer?
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
I am using a Core i7-980X running 64-bit Windows 7. I recently noticed that my computer started to crunch SSE2 work units when it supports instruction sets up to SSE3 (a.k.a. PNI), and that it used to exclusively use the PNI application. I then let them finish and drain out. I then reset the project to try to get sent the SSE3 or PNI applications to get back to full speed. I knew that SSE3 was a minor improvement over SSE2, but I wanted maximum throughput. I was then given the no optimization 64-bit application, which the task manager shows is running in 32-bit mode by showing a *32 suffix to the process name. This application is much slower because the x87 FPU (the standard FPU for 32-bit x86 processors) is very slow and is the reason AMD made SSE2 the standard FPU for 64-bit mode. Could someone please fix the scheduler to send the correct applications to the correct computers? |
Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,671,357 RAC: 0 |
I'm seeing the same behaviour on my machines ... |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
I'm getting tasks mostly allocated to the sse2 application too, but I think the scheduler - within its own limitations - is operating as it was designed to. The BOINC server keeps track of the apparent efficiency of each application, on each computer. You can see the values on the Application details link for each host - that link is my i7. The efficiency is expressed as the 'Average processing rate' (APR) in GFLOPS. At the time of writing, my i7 is showing: 32-bit apps SSE2: 11.72 GFLOPS PNI: 10.12 GFLOPS 64-bit apps SSE2: 13.51 GFLOPS PNI: 10.51 GFLOPS (make sure you look at the current 451.07 version, that the bottom of the list, when checking your own values) So, on the information available, the scheduler is correct in picking SSE2 for that host (and a few others I've checked). So, why does the SSE2 app appear to be faster, when we all know it isn't in reality? It's all to do with the averaging - the server doesn't measure the speed of each application directly, but works it out from the size of the jobs being sent out, and the time they take to run. For this particular project, there are two problems with this approach. 1) 'The size of the job'. This is declared by the scientist submitting the job, and known as <rsc_fpops_est> - it's also used to calculate the estimated runtime shown in the task list in BOINC Manager. I haven't been keeping detailed records, but I have a suspicion that not every job has had the appropriate <rsc_fpops_est> recently: if the estimate is too low, and the job runs longer than BOINC expects, then the speed appears low and the application less efficient. 2) 'The time they take to run'. As we know, LHC is looking for collider design parameters which result in stable orbits - and they are particularly interested in finding and eliminating instabilities which result in particles colliding with the tunnel wall or magnets. Better that virtual particles hit virtual walls in our computers, than in the real thing. Maybe there were a batch of long-running tasks which drove down the APRs for SSE3/PNI, followed by a batch of tunnel-hitters just after we'd switched to SSE2? I can't be certain, but it's possible. Within the limits of the current BOINC runtime-estimation tools, I'm not sure what the project can do about this. One thing would be to stress on the scientists the importance of checking and adjusting the <rsc_fpops_est> for the jobs they're submitting. Another possibility is marking the tunnel-hitters as 'Runtime outliers' (via the validator), so that a task which finishes early doesn't get taken to mean a super-fast processor or application. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
I think that the estimation of gigaflops must be off because SSE2 and SSE3 allows four single precision floating point operations per instruction or two double precision floating point operations per instruction, while the x87 FPU only allows one extended precision floating point operation per instruction. |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
No. For the purpose of this discussion, forget that the figures have any meaning at the individual instruction level, or inside the real silicon of a real CPU chip. These are just BOINC estimates and averages used for scheduling, nothing more. Taking some figures from my i5 laptop, which has a measured floating point benchmark speed of 2857.28 million ops/sec (now that has some claim to be a real value): I had seven tasks in progress just now, including two which had completed but not yet been reported. Every task had an estimated 'size' of <rsc_fpops_est> 180000000000000 (1.8 x 10^14 floating point operations) One of them finished in 4468.23 seconds. BOINC calculates a speed of 40.284 GFLOPS Another finished in 168.23 seconds. BOINC calculates a speed of 1.069 TFLOPS. Another is still running after 2 hours 35 minutes, and hasn't reached 20% yet. I reckon it's on target for 13.4 hours, or 48184 seconds. That would be a 'BOINC speed' of 3.735 GFLOPS. BOINC will average all of those speeds (and many more runs besides) to come up with an APR for these tasks being run by this application version (the x86 SSE2, as it happens - current working estimate 7.14 GFLOPS). That will be used as a comparison with the other applications, measured empirically in the same way, purely to decide which application to send next time. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
Using that logic, the estimated gigaflops amount can be thrown off by tasks which terminate early (e.g. the simulated particles immediately crashed into the simulated walls of the LHC so there is no more need to track them). The first two runs that were returned from the no optimization application exited early. The third one that my computer returned went the full way and did not exit early. I guess the only solution without uninstalling and reinstalling BOINC to get a new computer number is to let the slow tasks process until they drag the estimated gigaflops average down. |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
It depends how many of the 'wrong' application tasks you have completed. The BOINC server doesn't actually take any notice of the calculated values until you have 'completed' 11 tasks (strictly, "more than 10"). 'Completed', in this context, means that you have returned them with a 'success' outcome, and that they have been validated by your wingmate. Naturally, the short-running tasks tend to validate first, which gives an unfair boost to APR in situations like this: as longer-running tasks are validated from pending, APR will tend to fall. If you go fishing for a new HostID number, remember that you will be abandoning the current APR calculated for x64_PNI, and all the others, too. The server will try your patience with trial runs of each of the application versions for the 'new' host too, before eventually settling down on the one it thinks your computer prefers. I wouldn't bother - the workflow at this project is too variable and unpredictable. If there is a significant difference in speed between the application versions, the server will work it out in the end: if the difference is insignificant - well, then it doesn't matter after all. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I am aware of this problem. I have asked for short runs to be treated as outliers and not to be used for computation of the APR. It is a problem for me too in that I am scared to use short runs for functional tests of new releases. I am waiting. Eric. (In addition beam-beam runs are"slower" than others. sse2/sse3/pni should be about the same but all faster than generic. I need to implement AVX. Everything runs in 32-bit mode.) |
Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,671,357 RAC: 0 |
My new laptop (i7-4710HQ) is only getting the "gen" application ... the scheduler doesn't even try to test the other applications. Estimated rate for this application is 18.71 GFLOPS. |
Send message Joined: 4 Jul 06 Posts: 7 Credit: 339,475 RAC: 0 |
Hi, I'm a developer on PrimeGrid. My sieve application is a single binary that can run with generic code, MMX, or SSE2. (I haven't implemented AVX either.) I do this by compiling each version of the critical code to an object file, compiling the rest as generic code, then linking them together. Could you do this with your application? I hope your application is C or C++ - this might be harder with Fortran. |
Send message Joined: 26 Jul 05 Posts: 17 Credit: 1,147,143 RAC: 0 |
* double-post * |
Send message Joined: 26 Jul 05 Posts: 17 Credit: 1,147,143 RAC: 0 |
Eric -- I am rejoining the project, after a number of months hiatus. After entering BOINC manager, Tools, Add Project) lhcathomeclassiccern.ch_sixtrack, the following message is outputted: Project temporarily unavailable, please try again. Present computer: DELL Latitude E7240 (quad), with Windows 7. Am I off by a dot? |
Send message Joined: 21 Jun 10 Posts: 40 Credit: 11,085,612 RAC: 2,773 |
Jim Martin, If you open BOINC Manager, then "Tools", then "Add Project" it should bring up another window where you can select "Add Project" and click "Next". Then a list of projects you can pick from should pop up where you can choose "LHC@Home". Or if you want to type the name in yourself, use "http://lhcathomeclassic.cern.ch/sixtrack". Hope that helps. |
Send message Joined: 26 Jul 05 Posts: 17 Credit: 1,147,143 RAC: 0 |
captainjack -- Have tried both ways. The first one, LHC@home yielded a message that said that the project had been added. Nothing downloaded, thus far. The other http route resulted in the afore-listed message. Also, under BOINC Manager, "Projects", LHC@Home has not been added. |
Send message Joined: 21 Jun 10 Posts: 40 Credit: 11,085,612 RAC: 2,773 |
Jim Martin, I just removed the LHC@Home project and re-added it using both methods and they both worked fine for me. I added it first using the "LHC@Home" listing in the pick-list under tools -> add project. As soon as it was added, it said that the master file was downloaded. I looked in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder and didn't see anything. I went ahead and removed it again and added it back in using the http://lhcathomeclassic.cern.ch/sixtrack address and it was re-attached. It said the master file was downloaded but I didn't see anything in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder. Then it downloaded a task, the task appeared in the project folder and now the task is running. Please try to add it again and if it still doesn't show up, post the related messages from your event log. Maybe that will help give us some clues as to why it is not adding. As a general question, are you able to add any other projects? |
Send message Joined: 28 Nov 09 Posts: 17 Credit: 3,974,186 RAC: 0 |
Also, under BOINC Manager, "Projects", LHC@Home has not been added. Sounds like a carbon copy of what I just went through with a new workstation. The 'solution', kindly provided by Harri Liljeroos, was to join Einstein@Home first. Whatever that process did allowed me to join LHC@Home successfully afterward. |
Send message Joined: 26 Jul 05 Posts: 17 Credit: 1,147,143 RAC: 0 |
Gentlemen -- I've been on LHC@Home for quite some time, although I "Removed" myself, for my own reasons. My account is still with LHC@Home, and can go back and see some of the successful past WU's. As for other projects, Einstein@Home, Rosetta@Home, Seti@Home, cpdn, etc., have given me no problems. Am currently running CernVM_WebAPI, with no problems (It's non-BOINC, of course.). Will try to re-add this project, then record the Event Log outputs. * The last attempt to add LHC@home did not seem to attach the project, either. Below, is the single-line from the Event Log: 6/8/2015 7:40:07 PM | | Fetching configuration file from http://lhcathomeclassic.cern.ch/sixtrack/get_project_config.php |
Send message Joined: 26 Jul 05 Posts: 17 Credit: 1,147,143 RAC: 0 |
LHC@home successfully added -- after VLHC@ome re-added. Whether a coincidence, or not, am now back in business. I thank you folks for offering suggestions. |
©2024 CERN