Message boards : Number crunching : Wrong applications sent to my computer?
Message board moderation

To post messages, you must log in.

AuthorMessage
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 27154 - Posted: 14 Feb 2015, 2:01:27 UTC

I am using a Core i7-980X running 64-bit Windows 7. I recently noticed that my computer started to crunch SSE2 work units when it supports instruction sets up to SSE3 (a.k.a. PNI), and that it used to exclusively use the PNI application. I then let them finish and drain out. I then reset the project to try to get sent the SSE3 or PNI applications to get back to full speed. I knew that SSE3 was a minor improvement over SSE2, but I wanted maximum throughput. I was then given the no optimization 64-bit application, which the task manager shows is running in 32-bit mode by showing a *32 suffix to the process name. This application is much slower because the x87 FPU (the standard FPU for 32-bit x86 processors) is very slow and is the reason AMD made SSE2 the standard FPU for 64-bit mode. Could someone please fix the scheduler to send the correct applications to the correct computers?
ID: 27154 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,671,357
RAC: 0
Message 27158 - Posted: 14 Feb 2015, 10:16:06 UTC

I'm seeing the same behaviour on my machines ...
ID: 27158 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 27159 - Posted: 14 Feb 2015, 13:43:27 UTC

I'm getting tasks mostly allocated to the sse2 application too, but I think the scheduler - within its own limitations - is operating as it was designed to.

The BOINC server keeps track of the apparent efficiency of each application, on each computer. You can see the values on the Application details link for each host - that link is my i7.

The efficiency is expressed as the 'Average processing rate' (APR) in GFLOPS. At the time of writing, my i7 is showing:

32-bit apps
SSE2: 11.72 GFLOPS
PNI: 10.12 GFLOPS

64-bit apps
SSE2: 13.51 GFLOPS
PNI: 10.51 GFLOPS

(make sure you look at the current 451.07 version, that the bottom of the list, when checking your own values)

So, on the information available, the scheduler is correct in picking SSE2 for that host (and a few others I've checked).

So, why does the SSE2 app appear to be faster, when we all know it isn't in reality? It's all to do with the averaging - the server doesn't measure the speed of each application directly, but works it out from the size of the jobs being sent out, and the time they take to run.

For this particular project, there are two problems with this approach.

1) 'The size of the job'. This is declared by the scientist submitting the job, and known as <rsc_fpops_est> - it's also used to calculate the estimated runtime shown in the task list in BOINC Manager. I haven't been keeping detailed records, but I have a suspicion that not every job has had the appropriate <rsc_fpops_est> recently: if the estimate is too low, and the job runs longer than BOINC expects, then the speed appears low and the application less efficient.

2) 'The time they take to run'. As we know, LHC is looking for collider design parameters which result in stable orbits - and they are particularly interested in finding and eliminating instabilities which result in particles colliding with the tunnel wall or magnets. Better that virtual particles hit virtual walls in our computers, than in the real thing.

Maybe there were a batch of long-running tasks which drove down the APRs for SSE3/PNI, followed by a batch of tunnel-hitters just after we'd switched to SSE2? I can't be certain, but it's possible.

Within the limits of the current BOINC runtime-estimation tools, I'm not sure what the project can do about this. One thing would be to stress on the scientists the importance of checking and adjusting the <rsc_fpops_est> for the jobs they're submitting. Another possibility is marking the tunnel-hitters as 'Runtime outliers' (via the validator), so that a task which finishes early doesn't get taken to mean a super-fast processor or application.
ID: 27159 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 27160 - Posted: 14 Feb 2015, 21:21:54 UTC - in response to Message 27159.  

I think that the estimation of gigaflops must be off because SSE2 and SSE3 allows four single precision floating point operations per instruction or two double precision floating point operations per instruction, while the x87 FPU only allows one extended precision floating point operation per instruction.
ID: 27160 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 27161 - Posted: 14 Feb 2015, 22:13:25 UTC - in response to Message 27160.  

No. For the purpose of this discussion, forget that the figures have any meaning at the individual instruction level, or inside the real silicon of a real CPU chip. These are just BOINC estimates and averages used for scheduling, nothing more.

Taking some figures from my i5 laptop, which has a measured floating point benchmark speed of 2857.28 million ops/sec (now that has some claim to be a real value):

I had seven tasks in progress just now, including two which had completed but not yet been reported.

Every task had an estimated 'size' of <rsc_fpops_est> 180000000000000 (1.8 x 10^14 floating point operations)

One of them finished in 4468.23 seconds. BOINC calculates a speed of 40.284 GFLOPS

Another finished in 168.23 seconds. BOINC calculates a speed of 1.069 TFLOPS.

Another is still running after 2 hours 35 minutes, and hasn't reached 20% yet. I reckon it's on target for 13.4 hours, or 48184 seconds. That would be a 'BOINC speed' of 3.735 GFLOPS.

BOINC will average all of those speeds (and many more runs besides) to come up with an APR for these tasks being run by this application version (the x86 SSE2, as it happens - current working estimate 7.14 GFLOPS). That will be used as a comparison with the other applications, measured empirically in the same way, purely to decide which application to send next time.
ID: 27161 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 27162 - Posted: 14 Feb 2015, 22:21:03 UTC - in response to Message 27161.  

Using that logic, the estimated gigaflops amount can be thrown off by tasks which terminate early (e.g. the simulated particles immediately crashed into the simulated walls of the LHC so there is no more need to track them). The first two runs that were returned from the no optimization application exited early. The third one that my computer returned went the full way and did not exit early. I guess the only solution without uninstalling and reinstalling BOINC to get a new computer number is to let the slow tasks process until they drag the estimated gigaflops average down.
ID: 27162 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 27163 - Posted: 14 Feb 2015, 22:34:13 UTC - in response to Message 27162.  

It depends how many of the 'wrong' application tasks you have completed. The BOINC server doesn't actually take any notice of the calculated values until you have 'completed' 11 tasks (strictly, "more than 10"). 'Completed', in this context, means that you have returned them with a 'success' outcome, and that they have been validated by your wingmate. Naturally, the short-running tasks tend to validate first, which gives an unfair boost to APR in situations like this: as longer-running tasks are validated from pending, APR will tend to fall.

If you go fishing for a new HostID number, remember that you will be abandoning the current APR calculated for x64_PNI, and all the others, too. The server will try your patience with trial runs of each of the application versions for the 'new' host too, before eventually settling down on the one it thinks your computer prefers.

I wouldn't bother - the workflow at this project is too variable and unpredictable. If there is a significant difference in speed between the application versions, the server will work it out in the end: if the difference is insignificant - well, then it doesn't matter after all.
ID: 27163 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27164 - Posted: 15 Feb 2015, 8:02:45 UTC

I am aware of this problem. I have asked for short runs
to be treated as outliers and not to be used for computation
of the APR. It is a problem for me too in that I am scared
to use short runs for functional tests of new releases.
I am waiting. Eric.
(In addition beam-beam runs are"slower" than others. sse2/sse3/pni
should be about the same but all faster than generic. I need to
implement AVX. Everything runs in 32-bit mode.)
ID: 27164 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,671,357
RAC: 0
Message 27174 - Posted: 19 Feb 2015, 22:19:05 UTC

My new laptop (i7-4710HQ) is only getting the "gen" application ... the scheduler doesn't even try to test the other applications.

Estimated rate for this application is 18.71 GFLOPS.
ID: 27174 · Report as offensive     Reply Quote
Ken_g6

Send message
Joined: 4 Jul 06
Posts: 7
Credit: 337,883
RAC: 340
Message 27454 - Posted: 16 May 2015, 18:26:29 UTC - in response to Message 27164.  

Hi, I'm a developer on PrimeGrid. My sieve application is a single binary that can run with generic code, MMX, or SSE2. (I haven't implemented AVX either.) I do this by compiling each version of the critical code to an object file, compiling the rest as generic code, then linking them together. Could you do this with your application?

I hope your application is C or C++ - this might be harder with Fortran.
ID: 27454 · Report as offensive     Reply Quote
Jim Martin

Send message
Joined: 26 Jul 05
Posts: 17
Credit: 1,142,450
RAC: 2
Message 27504 - Posted: 7 Jun 2015, 0:19:41 UTC
Last modified: 7 Jun 2015, 0:21:44 UTC

* double-post *
ID: 27504 · Report as offensive     Reply Quote
Jim Martin

Send message
Joined: 26 Jul 05
Posts: 17
Credit: 1,142,450
RAC: 2
Message 27505 - Posted: 7 Jun 2015, 0:20:42 UTC

Eric --

I am rejoining the project, after a number of months hiatus. After entering
BOINC manager, Tools, Add Project) lhcathomeclassiccern.ch_sixtrack,
the following message is outputted: Project temporarily unavailable,
please try again.

Present computer: DELL Latitude E7240 (quad), with Windows 7.

Am I off by a dot?
ID: 27505 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,591,046
RAC: 9,075
Message 27506 - Posted: 7 Jun 2015, 1:54:51 UTC

Jim Martin,

If you open BOINC Manager, then "Tools", then "Add Project" it should bring up another window where you can select "Add Project" and click "Next". Then a list of projects you can pick from should pop up where you can choose "LHC@Home".

Or if you want to type the name in yourself, use "http://lhcathomeclassic.cern.ch/sixtrack".

Hope that helps.
ID: 27506 · Report as offensive     Reply Quote
Jim Martin

Send message
Joined: 26 Jul 05
Posts: 17
Credit: 1,142,450
RAC: 2
Message 27507 - Posted: 8 Jun 2015, 2:04:40 UTC - in response to Message 27506.  
Last modified: 8 Jun 2015, 2:50:22 UTC

captainjack -- Have tried both ways. The first one, LHC@home yielded a
message that said that the project had been added. Nothing downloaded, thus
far. The other http route resulted in the afore-listed message.

Also, under BOINC Manager, "Projects", LHC@Home has not been added.
ID: 27507 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,591,046
RAC: 9,075
Message 27508 - Posted: 8 Jun 2015, 3:03:50 UTC

Jim Martin,

I just removed the LHC@Home project and re-added it using both methods and they both worked fine for me.

I added it first using the "LHC@Home" listing in the pick-list under tools -> add project. As soon as it was added, it said that the master file was downloaded. I looked in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder and didn't see anything. I went ahead and removed it again and added it back in using the http://lhcathomeclassic.cern.ch/sixtrack address and it was re-attached. It said the master file was downloaded but I didn't see anything in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder. Then it downloaded a task, the task appeared in the project folder and now the task is running.

Please try to add it again and if it still doesn't show up, post the related messages from your event log. Maybe that will help give us some clues as to why it is not adding.

As a general question, are you able to add any other projects?
ID: 27508 · Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 28 Nov 09
Posts: 17
Credit: 3,974,186
RAC: 0
Message 27509 - Posted: 8 Jun 2015, 11:57:39 UTC - in response to Message 27507.  

Also, under BOINC Manager, "Projects", LHC@Home has not been added.

Sounds like a carbon copy of what I just went through with a new workstation. The 'solution', kindly provided by Harri Liljeroos, was to join Einstein@Home first. Whatever that process did allowed me to join LHC@Home successfully afterward.
ID: 27509 · Report as offensive     Reply Quote
Jim Martin

Send message
Joined: 26 Jul 05
Posts: 17
Credit: 1,142,450
RAC: 2
Message 27510 - Posted: 8 Jun 2015, 23:39:53 UTC - in response to Message 27506.  
Last modified: 8 Jun 2015, 23:44:32 UTC

Gentlemen -- I've been on LHC@Home for quite some time, although I "Removed"
myself, for my own reasons. My account is still with LHC@Home, and can go back
and see some of the successful past WU's.

As for other projects, Einstein@Home, Rosetta@Home, Seti@Home, cpdn, etc.,
have given me no problems.

Am currently running CernVM_WebAPI, with no problems (It's non-BOINC, of course.).

Will try to re-add this project, then record the Event Log outputs.

*

The last attempt to add LHC@home did not seem to attach the project, either.
Below, is the single-line from the Event Log:

6/8/2015 7:40:07 PM | | Fetching configuration file from http://lhcathomeclassic.cern.ch/sixtrack/get_project_config.php
ID: 27510 · Report as offensive     Reply Quote
Jim Martin

Send message
Joined: 26 Jul 05
Posts: 17
Credit: 1,142,450
RAC: 2
Message 27544 - Posted: 16 Jun 2015, 14:19:56 UTC - in response to Message 27510.  
Last modified: 16 Jun 2015, 14:21:55 UTC

LHC@home successfully added -- after VLHC@ome re-added. Whether a coincidence,
or not, am now back in business. I thank you folks for offering suggestions.
ID: 27544 · Report as offensive     Reply Quote

Message boards : Number crunching : Wrong applications sent to my computer?


©2024 CERN