Message boards : Number crunching : SixTrack and LHC@home status
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 23198 - Posted: 22 Sep 2011, 19:15:55 UTC

Well I find myself having to apologise again.
Perhaps I should concentrate more on how well
it is going really, in spite of a few glitches
and bugs.

First let me say I am an Honorary Group member of
the Accelerator Beam Physics group in the
BE department at CERN. That means I am a volunteer
too. CERN gives me an office and a couple of computers
and that's it. I am completely unpaid of course
so I cannot travel except at my own expense and I
actually use Skype and Nonoh for convenience for
my phone calls. iI guess I am too old to be comfortable with the WWW even if it was invented
at CERN. I believe my efforts are appreciated
by my colleagues and I really enjoy working and
trying to bridge the gap between accelerator
physicists and the computer systems, and helping
the new younger fellows and visitors. I have a great
deal of freedom but I also feel responsible for
support.

So, what follows is under my own responsibility
entirely and does NOT represent CERN, or the BE
or IT departments or the ABP group in any way. I am
worried that we might lose our communications
via the Web temporarily but if there is a hitch
I will do my best to respond to personal contacts
if there aren't too many! I leave it as an exercise
to the student to find me. I am in various phonebooks
as Frederick McIntosh (but I am called Eric).

Recent feedback from you has been extremely valuable and
we are trying to respond as best we can. I myself have
created a problem with recent studies which cover amplitudes
from 8 to 68 Sigma (we are looking for a Dynamic Aperture DA
of at least 12). I am NOT an accelerator physicist so
take this with a pinch of salt. At amplitudes greater
than 14 the particles may get lost very quickly and if
they fail to make a minimum of 1000 turns I get back a
valid, but empty, result file. This is very difficult for me
to handle but I do run some of these locally just to make
sure they are genuine. (This explains why you might get very
little credit because the job takes only a few seconds.
We are thinking about this.) These cases are also very
useful for checking the numerical portability and I pride myself
as one of the few, only person, to get identical results
(0 ULP, Units in the Last Place) on any PC. This issue
was further compounded by my mistakes and by File system
failures at CERN which created wrong input files
or prevented the correct download of results and database
update.

Still we are progressing really well now and I hope we shall have
a stable situation very soon. We are also working hard on
the WEB pages. We would like to have muliple executables optinised
for different hardware and use the BOINC deadline scheduling.
I would also like to thank all my collaborators at CERN, EPFL
and ENS Lyon as well as all of you for your computer time and
valuable comments.

For the future I am buying myself a MAC (yes a MAC for McIntosh)
and hope to have a compatible excutable in a few weeks for
that system. Apart from continued support and verification
of the physics results. I may also have to replace the
Fortran Formatted Input/Output for decimal binary conversion
by a C routine, using strtod() for the initiated, to
maintain 0 ULP difference and precision.

I dream that I might be able to publish this 0 ULP
project before the end of the year, but I have a huge backlog
of documentation to provide. Next year I would like to try the
GPUs, but of course Intel and Nvidia are in competition.
If 0 ULP works out I should also be able to use Sparc, IBM
RISC, any IEEE 754 compliant hardware. Then at last I might
be able to really work on a Floating-Point Error Analysis
of SixTrack to determine if we can improve the accuracy
of the simulations.

Well I hope my concerns are unfounded but I fell really guilty about
the lack of communication and information in the past.

Thanks to all and here's to you, BOINC, and LHC@home.

Eric.
ID: 23198 · Report as offensive     Reply Quote
Rapture
Avatar

Send message
Joined: 21 Oct 07
Posts: 21
Credit: 39,100
RAC: 0
Message 23200 - Posted: 22 Sep 2011, 21:11:10 UTC - in response to Message 23198.  

Thanks for the update! Your work is well appreciated. I love this project and glad to see it is back on track! :)
ID: 23200 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 6,720,840
RAC: 0
Message 23202 - Posted: 22 Sep 2011, 21:59:36 UTC - in response to Message 23200.  
Last modified: 22 Sep 2011, 22:00:00 UTC

For the future I am buying myself a MAC (yes a MAC for McIntosh) and hope to have a compatible excutable in a few weeks for that system.

Great news!!
Dublin, California
Team: SETI.USA

ID: 23202 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23204 - Posted: 22 Sep 2011, 22:28:58 UTC - in response to Message 23198.  
Last modified: 22 Sep 2011, 22:29:20 UTC

These cases are also very
useful for checking the numerical portability and I pride myself
as one of the few, only person, to get identical results
(0 ULP, Units in the Last Place) on any PC.


Your work on 0 ULP is phenomenal. Be very proud of that! I hope you can eventually get it all documented and published as I am sure others will make good use of it.
ID: 23204 · Report as offensive     Reply Quote
T.J.

Send message
Joined: 17 Feb 07
Posts: 86
Credit: 968,855
RAC: 0
Message 23221 - Posted: 23 Sep 2011, 12:57:53 UTC

Thanks for this explanation mister McIntosh.
I like this project and the accelerator physics, and am happy that it is running again with a lot of WU's.
A lot has been done by you and your staff and for me I find the support and the information we get useful and fast. I also see that ideas from us crunchers is used and handled with. A lot of projects can learn from you.

Well done so far and enjoy.

I will crunch for LHC as long as I have electrical power.
Greetings from,
TJ
ID: 23221 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 23440 - Posted: 10 Oct 2011, 14:42:57 UTC

Just copied from the Tasks v530.9 crashing thread.

MEA CULPA. Having concentrated so much on the floating-point model options for the ifort compiler I rather forgot the basic PC architecture. I was anxious to get a two to four times faster version into production so as to maximise the use of your systems. I just removed the arch IA32 flags to allow use of SSE2 (which is floating-point compatible for me) and thus generated an executable for the very modern Linux and Windows PCs in my office. NOT a good idea as we see from your messages. So Igor has put us back to my Version 4308 or BOINC 530.8/530.10.
We shall get this sorted out as soon as possible and have different executables for different platforms so as to optimise resource utilisation. (We have also increased, probably by too much, the fpops, disk space, and elapsed time estimates.) This will hopefully give us some breathing space as it is now absolutely vital that I check the physics results of all these recent studies.

The problem is basically that I had to switch to the Intel IFORT compiler with the new BOINC version. With the appopriate fp-model flags this worked really well, until I found a small number of 1 ULP differences on the formatted input of the accelerator description.
This difference appears between the Linux and Windows executables even on the same hardware apparently. The problem of formatted input is well understood and was largely solved by David. M Gay some twenty years ago, and is I believe handled correctly by C99. As a (temporary) solution I now read the data as Single Precision. The recent studies, including those very short runs, not even one turn, will allow me to evaluate the physics impact of this change. If the effect is too large I shall have to replace the Fortran formatted input IFORT runtime routine with a correct C strtod........sigh. This would be useful on the longer term as it would hopefully allow the use of other compilers but still producing identical results. Again all this should be a non-issue when compilers with the new Fortran 2003 Formatted I/O ROUND options become available.

Thanks for your understanding. Eric.

ID: 23440 · Report as offensive     Reply Quote
Profile [AF>Libristes] Dudumomo

Send message
Joined: 23 Jan 07
Posts: 3
Credit: 1,338,492
RAC: 0
Message 23447 - Posted: 11 Oct 2011, 1:51:40 UTC - in response to Message 23440.  

Thank you Eric for your news !
We really appreciate your communication !

Keep up the good work !
ID: 23447 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23448 - Posted: 11 Oct 2011, 4:14:02 UTC - in response to Message 23447.  

Thank you Eric for your news !
We really appreciate your communication !

Keep up the good work !


+1

You get an 'A' for communication.
ID: 23448 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 607
Credit: 3,792,141
RAC: 831
Message 23449 - Posted: 11 Oct 2011, 4:24:07 UTC

The problem is that now LHC 1.0 runs on high priority on my Linux box with an Opteron 1210 which is SSE3 capable and has a 64 bit CPU although I am using a 32 bit SuSE Linux 11.1. Since I am running other 5 BOINC projects, including Test4Theory@home and also a Solaris Virtual Machine running SETI@home, having a program running in high priority is a problem.
Tullio
ID: 23449 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23451 - Posted: 11 Oct 2011, 5:29:20 UTC - in response to Message 23449.  

Why is high priority a problem?
ID: 23451 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 607
Credit: 3,792,141
RAC: 831
Message 23452 - Posted: 11 Oct 2011, 7:39:19 UTC - in response to Message 23451.  
Last modified: 11 Oct 2011, 8:02:00 UTC

Because it is occupying one of my two cores, leaving 5 BOINC programs to share the other, not to mention the Solaris Virtual Machine running SETI@home.
Tullio
Anyway, after reaching 18.4% after 08:49 hours is back to normal.
ID: 23452 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23458 - Posted: 11 Oct 2011, 17:14:41 UTC - in response to Message 23452.  

Tasks in high priority can occupy the processor longer than a normal task would but when that task is finished your other projects will receive above normal CPU time to balance the extra CPU time the high priority task received. Over the long term your resource shares will be honored. If all tasks return before the deadline and project shares are honored then there is no problem.
ID: 23458 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23465 - Posted: 12 Oct 2011, 3:36:14 UTC - in response to Message 23458.  
Last modified: 12 Oct 2011, 3:36:42 UTC

Tasks in high priority can occupy the processor longer than a normal task would but when that task is finished your other projects will receive above normal CPU time to balance the extra CPU time the high priority task received. Over the long term your resource shares will be honored. If all tasks return before the deadline and project shares are honored then there is no problem.

----------
Are you sure about this?
Tasks in high priority can occupy the processor longer than a normal task would but when that task is finished your other projects will receive above normal CPU time to balance the extra CPU time the high priority task received.

The impression I get is that Tasks run high priority (get 100% and are not switched out) to meet the Deadline Date and once BOINC determines that a Task has had enough High Priority time it will revert to the normal Task sharing cycle. From observation I see no instance when Tasks that received less time while a another Task was running at High Priority are given more crunching time. I run all my projects with the same Resource Share setting. The only thing unusual is that I also run Test4Theory which get's 100% (actually 80% since I don't allow 100% usage) of one core.

This is based strictly on observation of the way the Tasks are queued and the indicated Status of the various Tasks. Since I run more than LHC (1.0) it can be difficult to see the exact effects of other Projects.
ID: 23465 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 23466 - Posted: 12 Oct 2011, 4:52:02 UTC - in response to Message 23465.  

Are you sure about this?
Tasks in high priority can occupy the processor longer than a normal task would but when that task is finished your other projects will receive above normal CPU time to balance the extra CPU time the high priority task received.

Yes, as long as you take a long term view, say around a month, rather than just a few days or so. BOINC keeps track of just how much time each project has had and will try (if not interfered with and when the reason for high priority (HP) has passed) to pay back the CPU time needed to honour the resource shares you have chosen. It may not always be successful at this. You should not underestimate the ability of computer owners to choose an "impossible" combination of things like mix of projects, the variability of project deadlines, resource shares, work cache size, time on fraction, work availability, etc, :-). If you see BOINC using HP mode frequently, it's a pretty good indication that you are using 'difficult' settings. Often simply reducing the work cache size, particularly if you support quite a few projects, will lower or remove the use of HP mode.

The ability to use HP mode isn't really a 'problem' - it's really a very good feature to allow BOINC to better manage things. People often expect every project in their 'mix' to always have tasks on board and will sometimes increase the work cache size to try to 'force' this. They will also increase cache size in an attempt to outlast project outages. That's fine if you run a single or very small number of projects, but a recipe for problems if you want to run several. If you run quite a few projects you should minimise work cache size and allow BOINC to download new tasks just when they are really needed. That way you lessen the risk of having lots of 'stale' tasks that end up either missing deadlines or having to be processed in panic mode. BOINC will be much better able to honour your desired resource shares that way.

Cheers,
Gary.
ID: 23466 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23469 - Posted: 12 Oct 2011, 6:42:36 UTC - in response to Message 23465.  

From observation I see no instance when Tasks that received less time while a another Task was running at High Priority are given more crunching time.


How well do you observe? Do you watch day and night for days on end? Do you record on paper (or maybe a spreadsheet) the scheduling events during that time to see which project gets what and for how long? BOINC does precisely that kind of meticulous observation and metes out your resources according to the project shares you set, over the long term.
ID: 23469 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23481 - Posted: 12 Oct 2011, 19:57:42 UTC - in response to Message 23469.  

From observation I see no instance when Tasks that received less time while a another Task was running at High Priority are given more crunching time.


How well do you observe? Do you watch day and night for days on end? Do you record on paper (or maybe a spreadsheet) the scheduling events during that time to see which project gets what and for how long? BOINC does precisely that kind of meticulous observation and metes out your resources according to the project shares you set, over the long term.

--------------
Do you watch day and night for days on end?

NO, I don't watch day and night. What kind of dumb question is that or are you just trying to be argumentative? It is a casual observation and comment. However, even if you only look at the Tasks list in BOINC Manager it's fairly easy to see what is happening and then think back to what you saw a couple of hours ago.

A previous reply to my observation was much more informative in that it pointed out that BOINC Manager does balancing over a longer period of time.
ID: 23481 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23483 - Posted: 12 Oct 2011, 22:02:26 UTC - in response to Message 23481.  

NO, I don't watch day and night. What kind of dumb question is that or are you just trying to be argumentative?


That was a rhetorical question intended to increase your awareness. I was not trying to be argumentative. Sorry if I offended.
ID: 23483 · Report as offensive     Reply Quote
Ver Greeneyes

Send message
Joined: 10 Sep 08
Posts: 29
Credit: 34,924
RAC: 0
Message 23511 - Posted: 15 Oct 2011, 1:08:42 UTC - in response to Message 23198.  
Last modified: 15 Oct 2011, 1:09:17 UTC

I may also have to replace the Fortran Formatted Input/Output for decimal binary conversion by a C routine, using strtod() for the initiated, to maintain 0 ULP difference and precision.

Note that Printing floating-point numbers quickly and accurately is hard. But maybe you'll find some inspiration in that paper :)

Then at last I might be able to really work on a Floating-Point Error Analysis of SixTrack to determine if we can improve the accuracy of the simulations.

Have you considered / are you using Kahan-Babuška-type summation to minimize the accumulation of errors? I recommend this paper for a good understanding of the error reduction / performance trade-off.
ID: 23511 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 6,720,840
RAC: 0
Message 23514 - Posted: 15 Oct 2011, 4:56:07 UTC

The important question is: When's the OSX app gonna be ready? :)
Dublin, California
Team: SETI.USA

ID: 23514 · Report as offensive     Reply Quote
Kestouf

Send message
Joined: 6 Apr 10
Posts: 1
Credit: 18,440
RAC: 0
Message 23515 - Posted: 15 Oct 2011, 6:12:12 UTC
Last modified: 15 Oct 2011, 6:23:21 UTC

Hello,
Do you have an estimated date for the GPU support for the LHC project? the gain in computation time will be phenomenal.

Thx
ID: 23515 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : SixTrack and LHC@home status


©2020 CERN