Message boards : Number crunching : boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,720,823
RAC: 14,362
Message 30580 - Posted: 1 Jun 2017, 4:01:19 UTC - in response to Message 30574.  

https://www.youtube.com/watch?v=mLQGXlxemlg - Optimizing HPC Service Delivery by a life time super computing tec



Makes sense.....us computer geeks here in the Great NW breath the air of Boeing and Microsoft (actually longer with Boeing for me) since I lived in Redmond when it was just trees way before it became
Microsoft 1 Microsoft Way, Redmond, WA
(years ago my neighbor was Grandma Boeing)

Has to be why I have been here at SixTrack almost 13 years 24/7
Volunteer Mad Scientist For Life
ID: 30580 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30626 - Posted: 4 Jun 2017, 12:08:38 UTC - in response to Message 30621.  

Well we are far from trying to optimise GPU code.
First let me explain that we have a tracking loop over turns
(up to 1,000,000 hoping for 10,000,000 soon) which contains
a large number of inner loops over particles, currently up to 64.
Luckily these loops over particles can be parallelised as each
particle is totally independent. In addition the original author F. Schmidt
pre-calculated everything possible before entering the tracking loop.
Each turn involves some 10,000 steps over a varying number of inner loops,
e.g. straight section, quadrupole, beam-beam interaction, power supply ripple, etc etc
of which there are about 50 different possibilities. A straight section is really just
a multiply and add, whereas beam beam involves hundreds or more FLOPs.
The first idea would be to use a much larger number of particles to best
utilise the GPU. This however would produce a large amount of I/O and
use a lot of disk space, but maybe not insurmountable. However all the code is
Fortran, the outer loop calls subroutines (could inline), and has many tests/branches.
It would be great if the main loop fitted entirely into the GPU and we would have
rare Host access for I/O or BOINC checkpoint and progress calls or when
one or more particles are lost.
My colleague Riccardo is actively looking at redoing in C which would also allow
much more portability and also allow to be parallel on multi-core systems.
For the moment we just run tasks in parallel, which works rather well (apart
from some current infrastructure problems). I hope to come up with
some numbers next week on GPU testing.

The code itself has been regularly measured and optimised; for example we
re-ordered array indices to optimise memory access and rewrote the Error Function
of a Complex Number to be faster but with adequate precision.

Portability does come at a price but ensures accuracy of results. I shall publish
measurements in an upcoming paper. I am sure we gain much more from being portable
and being able to use almost any IEEE 754 compliant processor.

On the issue of SixTrack and/or experiments this will shortly be under discussion at
CERN I am sure. Currently SixTrack has many more Hosts/volunteers, is simple to install,
and has been around for 13 years. Not everyone loves VMbox. Not a big deal at
present as we rarely have enough SixTrack work to keep all volunteers busy.

I hope to re-address all this in some weeks after current BOINC infrastructure issues
are resolved and we have the new "super" sixtrack with much broader appliaction
e.g.collimation studies and we support a much wider range of platforms MacOS ARM
and use features such as AVX.

Eric.
ID: 30626 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 30729 - Posted: 10 Jun 2017, 20:55:24 UTC - in response to Message 30723.  

thank you for the reply ! In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set.

Some CMS users have reported problems when their jobs land at sites running Singularity -- to the point that they blacklist sites they know to run the product. I have not heard yet whether the problem has been identified, nor solved.
ID: 30729 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 373
Credit: 238,712
RAC: 0
Message 30745 - Posted: 12 Jun 2017, 7:45:56 UTC - in response to Message 30723.  
Last modified: 12 Jun 2017, 7:46:23 UTC

thank you for the reply ! In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set.


Containers are not visualization. Our challenge is that 85% of the volunteers have Windows and the HEP applications only run on Linux. This project is constrained by the production code that the experiments are using. You may be interested to follow the work of the HEP software foundataion.
ID: 30745 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 373
Credit: 238,712
RAC: 0
Message 30746 - Posted: 12 Jun 2017, 7:50:42 UTC - in response to Message 30733.  

and : QEMU is obviously be of use on many projects because of machine emulation and virtualisation..
Comes in flavours including Windows, Mac and Linux.

http://www.qemu.org/



Why would this be better than VirtualBox?
ID: 30746 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 373
Credit: 238,712
RAC: 0
Message 30752 - Posted: 12 Jun 2017, 9:35:24 UTC - in response to Message 30751.  

QEMU operates within the virtualisation component of windows ....
has multiple machines to emulate .... & is reliable..


Is it easier to install that VirtualBox? Does it require any BIOS changes?
ID: 30752 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 193
Credit: 1,504,161
RAC: 227
Message 30816 - Posted: 17 Jun 2017, 21:47:18 UTC - in response to Message 30626.  


My colleague Riccardo is actively looking at redoing in C which would also allow
much more portability and also allow to be parallel on multi-core systems.
For the moment we just run tasks in parallel, which works rather well (apart
from some current infrastructure problems). I hope to come up with
some numbers next week on GPU testing.


VERY interesting!
ID: 30816 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30818 - Posted: 18 Jun 2017, 1:43:24 UTC - in response to Message 30817.  

Have not got the numbers (yet)., but anecdotally, both Boeing and CERN were
members of the Cray User Advisory committee many many years ago at the
beginning of the end of the mainframe era. (I am 76 years old so I am afraid I
may be a bit slow to adapt :-) My priorities are RFP,
Reliability, no use if it fails
Functionality, needs to do what you want
Performance, as fast as possible.
Eric.
ID: 30818 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1687
Credit: 103,019,248
RAC: 126,273
Message 30819 - Posted: 18 Jun 2017, 6:32:33 UTC - in response to Message 30818.  

My priorities are RFP,
Reliability, no use if it fails
Functionality, needs to do what you want
Performance, as fast as possible.
Eric.

Fully d'accord :-) :-) :-)
ID: 30819 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30821 - Posted: 18 Jun 2017, 11:36:18 UTC - in response to Message 30820.  

Thanks for all that; very relevant. Right now I am porting
to IBM. (I have used openMP, MPI, PVM in the past,
rather successfully. You can find my primitive CV at
http://mcintosh.web.cern.ch/mcintosh/ . . . )
Sadly I have NOT found a fully compliant Fortran 2003 compiler (yet)
even commercial products. I shall try again when other issues are
resolved, but maybe we shall have a SixTrack C version sooner.
ID: 30821 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30822 - Posted: 18 Jun 2017, 11:57:36 UTC

.....and sorry Quantum I forgot a couple of things (old age!)

I am basically SixTrack and not really involved with VMbox etc etc
I seem to remember a comment about 1 minute for Science.
I just found a problem with WU id 70999433 resultid 146847878
w-c3_n4_lhc2016_40_MD-135-16-476-2.5-1.1806__11__s__64.31_59.32__15_16__6__39_1_sixvf_boinc6457_1
It seems to have been stuck running for over 7 hours for 1 minute or so CPU
before you killed it!!! This is VERY strange. I shall look at this when I can.
Must be a Windows problem.......

AND many thanks for all your support for SixTrack. Our overall error rate is
around 2% and I want to improve it. Still need to run everything twice though.
Memory errors, overclocked machines, random OS errors, etc etc

Right now I must make sure the problem of work not being taken is solved.
This frustrates and loses us volunteers. Sometimes we have not had enough work,
and it is disastrous when we do have work, and it is not distributed. Eric.
ID: 30822 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 30832 - Posted: 18 Jun 2017, 21:53:07 UTC - in response to Message 30821.  

Sadly I have NOT found a fully compliant Fortran 2003 compiler (yet)
even commercial products. I shall try again when other issues are
resolved, but maybe we shall have a SixTrack C version sooner.

Eric, I'm a bit surprised to hear that. I thought that Intel and PGI, at least, were very up-to-date. That said, coarrays appear to be the Next Big Thing. Do you follow the comp.lang.fortran Usenet group?
ID: 30832 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30835 - Posted: 19 Jun 2017, 4:27:38 UTC - in response to Message 30832.  

Well I have to be careful what I say here. I have used the PGI compiler suite including openMP for several years, again with great success. I had hoped to use it for GPU as well, as they have worked hard on that. Sadly, we no longer have an up to date compiler nor a licence :-(. I had to fight to keep our nagfor, which is my choice for testing. Our current production code is/was
generated by ifort but we are moving to gfortran. However, as you know I specialise in producing identical results, 0 ULP difference, even when studying chaos. This works fine across Windows, Linux, and MacOS. It works on Intel, Intel compatible chips,. ARM on the way, and I am trying to test IBM Power 8 right now. The results produced by ifort on Linux and Windows were very vey different (and when I followed an ifort Webinar on portable libraries, the speaker immediately dismissed any thought of Windows/Linux compatibility for results). It also works fine with 5 tested compilers, apart from the occasional bug, at different levels of optimisation. The compilers are Lahey Fujitsu lf95, pgf90, nagfor, ifort and gfortran. I started with lf95 because it had great compatibility between Linux and Windows although the company did not claim to really support this. A correct implementation of formatted I/O with the different rounding modes as specified in Fortran 2003 would be an enormous help and would make much of my work redundant. It has been a couple of years since I last tested all this, but I shall have to do so before I publish. I have to confess that I have not been following comp.lang.fortran, but
I will need to look when I can (Michael Metcalf, an important contributor to the various
Fortran standards worked at CERN for many years). All this is complicated by the lack of any
budget, CERN's decision to use C++ for LHC software development, a general lack of
interest in my desired level of portability, and especially my overwhelming desire to make
LHC@home as reliable as possible. CERN has recently recognised LHC@home as an official
project, and I am getting a lot of help from some great young talented people.
Hence all the effort to support the LHC experiments.
CERN and other institutes have been using the GRID which I suppose will become a CLOUD.

On the SixTrack side I/we must sort
out the valid/invalid null/empty result file, the "outlier" problem leading to real time limit
being exceeded thus wasting up to ten hours of volunteer computing time, and the
frustrating "Work not being distributed" issues. I am really really trying to get all this
documented and published, but by character and experience I am really devoted to running
a service and trouble shooting rather than development. All this should really be lots of fun, and thanks to all
the volunteers, we have come a long way over the last ten years, when I had a bit of a crazy
idea to try PlayStations. As you also know we had up to 400,000 Tasks in progress
during the recent Pentathlon, and I believe we are delivering about the same computing
capacity as the entire CERN Computer Centre, but not the GRID, and all this largely "free"
thanks to all your volunteer contributions. All that said, we are also getting ready for a
major upgrade of SixTrack with lots of additional "Physics" and while we shall surely have
a few hiccups, we are struggling to do adequate testing, this will be an enormous leap
forward for both you and us. Eric.
ID: 30835 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 30845 - Posted: 19 Jun 2017, 8:58:34 UTC - in response to Message 30835.  

On the SixTrack side I/we must sort
out the valid/invalid null/empty result file, the "outlier" problem leading to real time limit
being exceeded thus wasting up to ten hours of volunteer computing time.....

Eric, with all the sorting out issues, do you still have thought about the SixTrack issue on Windows machines,
where all tasks use very low cpu usage for several minutes up to an hour, not doing real scientific work, but only Windows conhost.exe and csrss.exe are using a bit cpu.
Reminder
ID: 30845 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30854 - Posted: 19 Jun 2017, 10:43:31 UTC - in response to Message 30845.  

Ahhh, thanks for the reminder. Old age is my excuse :-) and many other problems.
Still, I have put this to the back of the queue, and I shouldn't because 90%
of our volunteers are Windows. This will certainly be part of my/our testing of
the new SixTrack executables. The current executables were built on Cygwin
and are not at all native Windows. I am not a Windows expert, in fact a bit of a Jack
of all trades, but I did have a look at
https://answers.microsoft.com/en-us/protect/forum/protect_other-protect_scanning/what-is-conhostexe/38a69fb8-ded2-4f35-85c5-4d69cb8d016b
If we exclude a virus, then I don't see how this should be SixTrack except that I suppose
it is using Network Services and needs to be started as well.....Are you Windows 7?
as at CERN?
Anyway I really appreciate the timely reminder and I shall try and get some help from
Windows experts if and when necessary. I have never seen this problem on my own
Windows 7 and Windows 10 machines. For a start I shall look you up if you give me a
HostId or something or just the name of your system. Eric.
ID: 30854 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 30870 - Posted: 19 Jun 2017, 14:05:35 UTC - in response to Message 30854.  

I have had a look at my Windows 10 system
With Task Manager I do indeed see a csrss.exe
and a conhost.exe. Now I am out my depth. I thought
one or more were obsoelte with Windows 10 BUT maybe "we"
are supplying it/them. At least this issue is on the list
now. Eric.
ID: 30870 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 30884 - Posted: 19 Jun 2017, 17:11:15 UTC - in response to Message 30854.  

it is using Network Services and needs to be started as well.....Are you Windows 7?
as at CERN?
Anyway I really appreciate the timely reminder and I shall try and get some help from
Windows experts if and when necessary. I have never seen this problem on my own
Windows 7 and Windows 10 machines. For a start I shall look you up if you give me a
HostId or something or just the name of your system. Eric.

I did a few actual SixTracks on 3 Windows machines. (Win7 and a Win10)
All have these loss of cpu time. The first is extreme.
The second machine is a 30 cores Win7 VM. When I run only 2 Sixtracks there the loss time is about 12 minutes. Doesn't matter how long a task is running. When all 30 cores start a job the low cpu time is over 1 hour, so that machine will not run SixTrack without solving that issue.

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10360630
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10362384
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416365
ID: 30884 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31047 - Posted: 25 Jun 2017, 15:29:42 UTC - in response to Message 31040.  

I UNDERSTAND PERFECTLY. I am sorry you quit, but it is just an added
incentive to me to fix it. I shall wait until we have native SixTrack 64-bit
executables (as well as 32) i.e. until we have built and tested the new SixTrack
on Windows 10. (CERN has abolished Windows XP and is currently
supporting Window 7.) Support for Windows doesn't know about BOINC,
and BOINC/Linux support doesn't know about Windows. I myself have
to solve other very serious problems. This problem is of course SERIOUS too,
but I am not a big Windows expert. (I did have a look and reported to
Crystal Pellet who first reported this problem.) In the end, if necessary, I
shall find a solution. SixTrack tries to open more than 60 files in the startup
I guess the problem lies there. ( We could maybe stagger these, but not
by much.) IMHO, Windows scheduling and I/O are not great to say the least, but
it has to cope with a huge variety of applications, interactive, batch,
real time, databases, etc etc. CERN scientific apps are mainly Linux.

I hate apologising when I do not think it is my fault. I have NO authority on these matters.
Hope to see you again soon. Eric.
ID: 31047 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 193
Credit: 1,504,161
RAC: 227
Message 31375 - Posted: 13 Jul 2017, 14:23:42 UTC - in response to Message 30626.  


My colleague Riccardo is actively looking at redoing in C which would also allow
much more portability and also allow to be parallel on multi-core systems.
For the moment we just run tasks in parallel, which works rather well (apart
from some current infrastructure problems). I hope to come up with
some numbers next week on GPU testing.


I think you are speaking of this project
ID: 31375 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 31883 - Posted: 7 Aug 2017, 20:22:36 UTC - in response to Message 31847.  

what is all this about the Cern VM (seems to be installable on many VM systems >

https://cernvm.cern.ch

https://cernvm.cern.ch/portal/vbinstallation

could we use this with boinc ?

or this > https://cernvm.cern.ch/portal/launch for virtual box app on boinc

We actually do use the CERN VM for CMS, and I believe for all the other sub-projects that use VMs. I don't think that there's anything particularly special about it, except that it's been tailored (capabilities, default software, etc) towards the needs of the LHC community, in much the same way as Scientific Linux Cern 4, 5, and 6 were customised on top of Scientific Linux (itself based on Red Hat Enterprise Linux), and now SLC7 is customised on top of Centos 7.

This means that we can run our Linux LHC software in the CERNVM on Linux, Windows and MacOS with a minimum of changes to be made to the underlying VM.

There is some interest in using containers instead of VMs, but some difficulties have arisen in the mainstream apps (i.e. the ones we run in our Data Centres) with aberrent behavior from some container implementations, so I don't think LHCatHome will be moving to containers in the next year or two.
ID: 31883 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM - PC, Mac & ARM development


©2024 CERN