boinc - enhancing research workloads for the benefit of mankind & humanity - Computer Optimisation - CPU , GPU & RAM

Author	Message
Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,751,712 RAC: 8,724	Message 30580 - Posted: 1 Jun 2017, 4:01:19 UTC - in response to Message 30574. https://www.youtube.com/watch?v=mLQGXlxemlg - Optimizing HPC Service Delivery by a life time super computing tec Makes sense.....us computer geeks here in the Great NW breath the air of Boeing and Microsoft (actually longer with Boeing for me) since I lived in Redmond when it was just trees way before it became Microsoft 1 Microsoft Way, Redmond, WA (years ago my neighbor was Grandma Boeing) Has to be why I have been here at SixTrack almost 13 years 24/7 Volunteer Mad Scientist For Life ID: 30580 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30626 - Posted: 4 Jun 2017, 12:08:38 UTC - in response to Message 30621. Well we are far from trying to optimise GPU code. First let me explain that we have a tracking loop over turns (up to 1,000,000 hoping for 10,000,000 soon) which contains a large number of inner loops over particles, currently up to 64. Luckily these loops over particles can be parallelised as each particle is totally independent. In addition the original author F. Schmidt pre-calculated everything possible before entering the tracking loop. Each turn involves some 10,000 steps over a varying number of inner loops, e.g. straight section, quadrupole, beam-beam interaction, power supply ripple, etc etc of which there are about 50 different possibilities. A straight section is really just a multiply and add, whereas beam beam involves hundreds or more FLOPs. The first idea would be to use a much larger number of particles to best utilise the GPU. This however would produce a large amount of I/O and use a lot of disk space, but maybe not insurmountable. However all the code is Fortran, the outer loop calls subroutines (could inline), and has many tests/branches. It would be great if the main loop fitted entirely into the GPU and we would have rare Host access for I/O or BOINC checkpoint and progress calls or when one or more particles are lost. My colleague Riccardo is actively looking at redoing in C which would also allow much more portability and also allow to be parallel on multi-core systems. For the moment we just run tasks in parallel, which works rather well (apart from some current infrastructure problems). I hope to come up with some numbers next week on GPU testing. The code itself has been regularly measured and optimised; for example we re-ordered array indices to optimise memory access and rewrote the Error Function of a Complex Number to be faster but with adequate precision. Portability does come at a price but ensures accuracy of results. I shall publish measurements in an upcoming paper. I am sure we gain much more from being portable and being able to use almost any IEEE 754 compliant processor. On the issue of SixTrack and/or experiments this will shortly be under discussion at CERN I am sure. Currently SixTrack has many more Hosts/volunteers, is simple to install, and has been around for 13 years. Not everyone loves VMbox. Not a big deal at present as we rarely have enough SixTrack work to keep all volunteers busy. I hope to re-address all this in some weeks after current BOINC infrastructure issues are resolved and we have the new "super" sixtrack with much broader appliaction e.g.collimation studies and we support a much wider range of platforms MacOS ARM and use features such as AVX. Eric. ID: 30626 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,230 RAC: 352	Message 30729 - Posted: 10 Jun 2017, 20:55:24 UTC - in response to Message 30723. thank you for the reply ! In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set. Some CMS users have reported problems when their jobs land at sites running Singularity -- to the point that they blacklist sites they know to run the product. I have not heard yet whether the problem has been identified, nor solved. ID: 30729 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0	Message 30745 - Posted: 12 Jun 2017, 7:45:56 UTC - in response to Message 30723. Last modified: 12 Jun 2017, 7:46:23 UTC thank you for the reply ! In reference to the use of virtual box there is a new product by berkley > http://singularity.lbl.gov/ called singularity that handles repeatable condition containers... and has low overhead for virtualisation data-set. Containers are not visualization. Our challenge is that 85% of the volunteers have Windows and the HEP applications only run on Linux. This project is constrained by the production code that the experiments are using. You may be interested to follow the work of the HEP software foundataion. ID: 30745 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0	Message 30746 - Posted: 12 Jun 2017, 7:50:42 UTC - in response to Message 30733. and : QEMU is obviously be of use on many projects because of machine emulation and virtualisation.. Comes in flavours including Windows, Mac and Linux. http://www.qemu.org/ Why would this be better than VirtualBox? ID: 30746 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0	Message 30752 - Posted: 12 Jun 2017, 9:35:24 UTC - in response to Message 30751. QEMU operates within the virtualisation component of windows .... has multiple machines to emulate .... & is reliable.. Is it easier to install that VirtualBox? Does it require any BIOS changes? ID: 30752 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 195 Credit: 1,504,161 RAC: 103	Message 30816 - Posted: 17 Jun 2017, 21:47:18 UTC - in response to Message 30626. My colleague Riccardo is actively looking at redoing in C which would also allow much more portability and also allow to be parallel on multi-core systems. For the moment we just run tasks in parallel, which works rather well (apart from some current infrastructure problems). I hope to come up with some numbers next week on GPU testing. VERY interesting! ID: 30816 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30818 - Posted: 18 Jun 2017, 1:43:24 UTC - in response to Message 30817. Have not got the numbers (yet)., but anecdotally, both Boeing and CERN were members of the Cray User Advisory committee many many years ago at the beginning of the end of the mainframe era. (I am 76 years old so I am afraid I may be a bit slow to adapt :-) My priorities are RFP, Reliability, no use if it fails Functionality, needs to do what you want Performance, as fast as possible. Eric. ID: 30818 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,886,994 RAC: 121,589	Message 30819 - Posted: 18 Jun 2017, 6:32:33 UTC - in response to Message 30818. My priorities are RFP, Reliability, no use if it fails Functionality, needs to do what you want Performance, as fast as possible. Eric. Fully d'accord :-) :-) :-) ID: 30819 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30821 - Posted: 18 Jun 2017, 11:36:18 UTC - in response to Message 30820. Thanks for all that; very relevant. Right now I am porting to IBM. (I have used openMP, MPI, PVM in the past, rather successfully. You can find my primitive CV at http://mcintosh.web.cern.ch/mcintosh/ . . . ) Sadly I have NOT found a fully compliant Fortran 2003 compiler (yet) even commercial products. I shall try again when other issues are resolved, but maybe we shall have a SixTrack C version sooner. ID: 30821 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30822 - Posted: 18 Jun 2017, 11:57:36 UTC .....and sorry Quantum I forgot a couple of things (old age!) I am basically SixTrack and not really involved with VMbox etc etc I seem to remember a comment about 1 minute for Science. I just found a problem with WU id 70999433 resultid 146847878 w-c3_n4_lhc2016_40_MD-135-16-476-2.5-1.1806__11__s__64.31_59.32__15_16__6__39_1_sixvf_boinc6457_1 It seems to have been stuck running for over 7 hours for 1 minute or so CPU before you killed it!!! This is VERY strange. I shall look at this when I can. Must be a Windows problem....... AND many thanks for all your support for SixTrack. Our overall error rate is around 2% and I want to improve it. Still need to run everything twice though. Memory errors, overclocked machines, random OS errors, etc etc Right now I must make sure the problem of work not being taken is solved. This frustrates and loses us volunteers. Sometimes we have not had enough work, and it is disastrous when we do have work, and it is not distributed. Eric. ID: 30822 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,230 RAC: 352	Message 30832 - Posted: 18 Jun 2017, 21:53:07 UTC - in response to Message 30821. Sadly I have NOT found a fully compliant Fortran 2003 compiler (yet) even commercial products. I shall try again when other issues are resolved, but maybe we shall have a SixTrack C version sooner. Eric, I'm a bit surprised to hear that. I thought that Intel and PGI, at least, were very up-to-date. That said, coarrays appear to be the Next Big Thing. Do you follow the comp.lang.fortran Usenet group? ID: 30832 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30835 - Posted: 19 Jun 2017, 4:27:38 UTC - in response to Message 30832. Well I have to be careful what I say here. I have used the PGI compiler suite including openMP for several years, again with great success. I had hoped to use it for GPU as well, as they have worked hard on that. Sadly, we no longer have an up to date compiler nor a licence :-(. I had to fight to keep our nagfor, which is my choice for testing. Our current production code is/was generated by ifort but we are moving to gfortran. However, as you know I specialise in producing identical results, 0 ULP difference, even when studying chaos. This works fine across Windows, Linux, and MacOS. It works on Intel, Intel compatible chips,. ARM on the way, and I am trying to test IBM Power 8 right now. The results produced by ifort on Linux and Windows were very vey different (and when I followed an ifort Webinar on portable libraries, the speaker immediately dismissed any thought of Windows/Linux compatibility for results). It also works fine with 5 tested compilers, apart from the occasional bug, at different levels of optimisation. The compilers are Lahey Fujitsu lf95, pgf90, nagfor, ifort and gfortran. I started with lf95 because it had great compatibility between Linux and Windows although the company did not claim to really support this. A correct implementation of formatted I/O with the different rounding modes as specified in Fortran 2003 would be an enormous help and would make much of my work redundant. It has been a couple of years since I last tested all this, but I shall have to do so before I publish. I have to confess that I have not been following comp.lang.fortran, but I will need to look when I can (Michael Metcalf, an important contributor to the various Fortran standards worked at CERN for many years). All this is complicated by the lack of any budget, CERN's decision to use C++ for LHC software development, a general lack of interest in my desired level of portability, and especially my overwhelming desire to make LHC@home as reliable as possible. CERN has recently recognised LHC@home as an official project, and I am getting a lot of help from some great young talented people. Hence all the effort to support the LHC experiments. CERN and other institutes have been using the GRID which I suppose will become a CLOUD. On the SixTrack side I/we must sort out the valid/invalid null/empty result file, the "outlier" problem leading to real time limit being exceeded thus wasting up to ten hours of volunteer computing time, and the frustrating "Work not being distributed" issues. I am really really trying to get all this documented and published, but by character and experience I am really devoted to running a service and trouble shooting rather than development. All this should really be lots of fun, and thanks to all the volunteers, we have come a long way over the last ten years, when I had a bit of a crazy idea to try PlayStations. As you also know we had up to 400,000 Tasks in progress during the recent Pentathlon, and I believe we are delivering about the same computing capacity as the entire CERN Computer Centre, but not the GRID, and all this largely "free" thanks to all your volunteer contributions. All that said, we are also getting ready for a major upgrade of SixTrack with lots of additional "Physics" and while we shall surely have a few hiccups, we are struggling to do adequate testing, this will be an enormous leap forward for both you and us. Eric. ID: 30835 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363	Message 30845 - Posted: 19 Jun 2017, 8:58:34 UTC - in response to Message 30835. On the SixTrack side I/we must sort out the valid/invalid null/empty result file, the "outlier" problem leading to real time limit being exceeded thus wasting up to ten hours of volunteer computing time..... Eric, with all the sorting out issues, do you still have thought about the SixTrack issue on Windows machines, where all tasks use very low cpu usage for several minutes up to an hour, not doing real scientific work, but only Windows conhost.exe and csrss.exe are using a bit cpu. Reminder ID: 30845 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30854 - Posted: 19 Jun 2017, 10:43:31 UTC - in response to Message 30845. Ahhh, thanks for the reminder. Old age is my excuse :-) and many other problems. Still, I have put this to the back of the queue, and I shouldn't because 90% of our volunteers are Windows. This will certainly be part of my/our testing of the new SixTrack executables. The current executables were built on Cygwin and are not at all native Windows. I am not a Windows expert, in fact a bit of a Jack of all trades, but I did have a look at https://answers.microsoft.com/en-us/protect/forum/protect_other-protect_scanning/what-is-conhostexe/38a69fb8-ded2-4f35-85c5-4d69cb8d016b If we exclude a virus, then I don't see how this should be SixTrack except that I suppose it is using Network Services and needs to be started as well.....Are you Windows 7? as at CERN? Anyway I really appreciate the timely reminder and I shall try and get some help from Windows experts if and when necessary. I have never seen this problem on my own Windows 7 and Windows 10 machines. For a start I shall look you up if you give me a HostId or something or just the name of your system. Eric. ID: 30854 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 30870 - Posted: 19 Jun 2017, 14:05:35 UTC - in response to Message 30854. I have had a look at my Windows 10 system With Task Manager I do indeed see a csrss.exe and a conhost.exe. Now I am out my depth. I thought one or more were obsoelte with Windows 10 BUT maybe "we" are supplying it/them. At least this issue is on the list now. Eric. ID: 30870 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363	Message 30884 - Posted: 19 Jun 2017, 17:11:15 UTC - in response to Message 30854. it is using Network Services and needs to be started as well.....Are you Windows 7? as at CERN? Anyway I really appreciate the timely reminder and I shall try and get some help from Windows experts if and when necessary. I have never seen this problem on my own Windows 7 and Windows 10 machines. For a start I shall look you up if you give me a HostId or something or just the name of your system. Eric. I did a few actual SixTracks on 3 Windows machines. (Win7 and a Win10) All have these loss of cpu time. The first is extreme. The second machine is a 30 cores Win7 VM. When I run only 2 Sixtracks there the loss time is about 12 minutes. Doesn't matter how long a task is running. When all 30 cores start a job the low cpu time is over 1 hour, so that machine will not run SixTrack without solving that issue. https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10360630 https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10362384 https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416365 ID: 30884 · Reply Quote

Eric Mcintosh Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0	Message 31047 - Posted: 25 Jun 2017, 15:29:42 UTC - in response to Message 31040. I UNDERSTAND PERFECTLY. I am sorry you quit, but it is just an added incentive to me to fix it. I shall wait until we have native SixTrack 64-bit executables (as well as 32) i.e. until we have built and tested the new SixTrack on Windows 10. (CERN has abolished Windows XP and is currently supporting Window 7.) Support for Windows doesn't know about BOINC, and BOINC/Linux support doesn't know about Windows. I myself have to solve other very serious problems. This problem is of course SERIOUS too, but I am not a big Windows expert. (I did have a look and reported to Crystal Pellet who first reported this problem.) In the end, if necessary, I shall find a solution. SixTrack tries to open more than 60 files in the startup I guess the problem lies there. ( We could maybe stagger these, but not by much.) IMHO, Windows scheduling and I/O are not great to say the least, but it has to cope with a huge variety of applications, interactive, batch, real time, databases, etc etc. CERN scientific apps are mainly Linux. I hate apologising when I do not think it is my fault. I have NO authority on these matters. Hope to see you again soon. Eric. ID: 31047 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 195 Credit: 1,504,161 RAC: 103	Message 31375 - Posted: 13 Jul 2017, 14:23:42 UTC - in response to Message 30626. My colleague Riccardo is actively looking at redoing in C which would also allow much more portability and also allow to be parallel on multi-core systems. For the moment we just run tasks in parallel, which works rather well (apart from some current infrastructure problems). I hope to come up with some numbers next week on GPU testing. I think you are speaking of this project ID: 31375 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,230 RAC: 352	Message 31883 - Posted: 7 Aug 2017, 20:22:36 UTC - in response to Message 31847. what is all this about the Cern VM (seems to be installable on many VM systems > https://cernvm.cern.ch https://cernvm.cern.ch/portal/vbinstallation could we use this with boinc ? or this > https://cernvm.cern.ch/portal/launch for virtual box app on boinc We actually do use the CERN VM for CMS, and I believe for all the other sub-projects that use VMs. I don't think that there's anything particularly special about it, except that it's been tailored (capabilities, default software, etc) towards the needs of the LHC community, in much the same way as Scientific Linux Cern 4, 5, and 6 were customised on top of Scientific Linux (itself based on Red Hat Enterprise Linux), and now SLC7 is customised on top of Centos 7. This means that we can run our Linux LHC software in the CERNVM on Linux, Windows and MacOS with a minimum of changes to be made to the underlying VM. There is some interest in using containers instead of VMs, but some difficulties have arisen in the mainstream apps (i.e. the ones we run in our Data Centres) with aberrent behavior from some container implementations, so I don't think LHCatHome will be moving to containers in the next year or two. ID: 31883 · Reply Quote

LHC@home