CMS@Home -- ongoing problems
Sorry that the CMS@Home HTCondor server is still playing up. Again over the weekend it refused to serve jobs even though plenty were available. Together with Federica we've decided not to inject another workflow this week, to let if "fail hard" again so that she can investigate which ClassAd preferences are not being met.
So, you will probably see the number of running jobs falling, and the number of errors increasing, in the next few days. Please feel free to set No New Tasks in that case. I won't, so that there is still some pressure for jobs on the server. I've also asked Laurence if I can run the CMS@Home VM outside of BOINC, to get around the quota back-off problem.
19 Feb 2020, 14:07:10 UTC · Discuss
CMS@Home up again
OK, jobs are available again. Sorry for the long delay. Remember, I'm only the front-man for a larger crew, so any downstream delays percolate up to my response. Hopefully this will remain good for some time, but I still don't understand why the condor server occasionally refuses to send out jobs in a timely manner.
12 Feb 2020, 11:14:38 UTC · Discuss
CMS@Home accidentally shut down -- Please set No New Tasks
We need to upgrade the CMS@Home WMAgent before Thursday, so I tried to set the workflows to drain down. Unfortunately, I misunderstood the batch states and killed off most of them instead. :-(. There's one still left with about 200 jobs, so that won't last long.
Please set your CMS projects to No New Tasks to avoid getting lots of computation errors. I'll let you know when the upgrade is done and jobs are flowing again.
10 Feb 2020, 15:33:39 UTC · Discuss
Server outage - uploads failing
Due to a network problem in the CERN computer centre early Thursday morning, our BOINC servers have lost access to a storage cluster. Hence uploads are failing and access to web pages as well. Hopefully this should be fixed soon.
23 Jan 2020, 7:31:25 UTC · Discuss
CMS@Home disruption this week
It appears that a database intervention at CERN went badly, leaving our data tables empty and us not being able to submit new CMS@Home jobs. Advice is that it will take several days to recover -- and as well as that some of the major players are in the USA, which has holidays for the rest of this week. I'll keep an eye on it, but I'm doubtful we'll be running again this week. Sorry 'bout that!
27 Nov 2019, 8:21:06 UTC · Discuss
Database intervention Monday morning
LHC@home and associated BOINC services will be unavailable for about 1 hour on Monday 25th of November due to a database storage intervention.
Thanks for your understanding and happy crunching.
22 Nov 2019, 13:38:09 UTC · Discuss
CMS job shortage Wednesday 13th November
CMS IT will be installing a new version of WMAgent on Wednesday. This will impact job availability for the duration of the intervention. We might be able to eliminate the little gremlin that's been plaguing us for the last few weeks, too.
So, please set your CMS processors to No New Tasks sometime tomorrow, Tuesday 12th, so that current tasks will stop requesting new jobs before the queues get cut. I'll let you know when jobs are available again.
11 Nov 2019, 15:49:27 UTC · Discuss
Following a couple of weeks of tests in the LHC@home development project, we are upgrading our production server cluster to BOINC server release 1.2 this afternoon. During the update we will be running with slightly lower server capacity than usual.
30 Sep 2019, 11:56:49 UTC · Discuss
The SixTrack team at the LHC@Home desk for the CERN open days
thanks to those who have filled in the doodle we circulated last week:
We decided to deliver a presentation every day in the most populated time slots out of the doodle poll, i.e. on Sat. 14th Sep, between 03:00 and 04:00 PM, and on Sun 15th Sep, between 02:00 and 03:00 PM.
The meeting point will be the LHC@Home desk in R2 (building 504), at the beginning of the time slot. We will have to walk few minutes to a meeting room where there will be the presentations. We will be back at the meeting point by the end of the time slot at the latest.
Looking forward to shaking hands and meeting you,
Alessio and Massimo, for the SixTrack team
12 Sep 2019, 11:53:15 UTC · Discuss
The SixTrack team welcomes the LHC@Home volunteers at the CERN open days
following Nils's post on the MBs:
the SixTrack team is looking into welcoming you at CERN and greet you for the CPU time you make available to us. To do so in the best way, we would like to know when you will be most likely passing by the IT stand, such that we concentrate our efforts on the time when most of you can be there. Hence, please find below a doodle that we will use to target the optimal time window:
Thanks a lot in advance, and happy crunching!
Alessio and Massimo, for the SixTrack team
2 Sep 2019, 8:33:56 UTC · Discuss
CERN Open Days in 2 weeks!
During the CERN Open Days 2019, we will have a small LHC@home stand as part of the IT activities in building 504 near the Data Centre.
LHC@home will also be present at the ATLAS experiment site, in the ATLAS Computing Corner.
We hope that many of you will be able to visit CERN during the Open Days and would be happy to see you here!
Please refer to: Plan your visit and the list of activities during the Open days for more information about all the visit points on the CERN sites.
30 Aug 2019, 9:07:17 UTC · Discuss
Many queued tasks - server status page erratic
Due to the very high number of queued Sixtrack tasks, we have enabled 4 load-balanced scheduler/feeder servers to handle the demand. (Our bottleneck is the database, but several schedulers can cache more tasks to be dispatched.)
Our server status page does not currently show in real time the daemon status on remote servers. Hence the server status page may indicate a varying number of processes, depending on which web server is active.
Please also be patient if you are not getting tasks for your preferred application quickly enough. After a few retries, there will be some tasks. Thanks for your understanding and happy crunching!
21 Aug 2019, 11:41:17 UTC · Discuss
CMS@Home disruption, Monday 22nd July
I've had the following notice from CERN/CMS IT:
>> following the hypervisor reboot campaign, as announced by CERN IT here: https://cern.service-now.com/service-portal/view-outage.do?n=OTG0051185
>> the following VMs - under the CMS Production openstack project - will be rebooted on Monday July 22 (starting at 8:30am CERN time):
>> | vocms0267 | cern-geneva-b | cms-home
to which I replied:
> Thanks, Alan. vocms0267 runs the CMS@Home campaign. Should I warn the volunteers of the disruption, or will it be mainly transparent?
and received this reply:
Running jobs will fail because they won't be able to connect to the schedd condor_shadow process. So this will be the visible impact on the users. There will be also a short time window (until I get the agent restarted) where there will be no jobs pending in the condor pool.
So it might be worth it giving the users a heads up.
So, my recommendation is that you set "No New Tasks" for CMS@Home sometime Sunday afternoon, to let tasks complete before the 0830 CST restart. I'll let you know as soon as Alan informs me that vocm0267 is up and running again
17 Jul 2019, 13:14:12 UTC · Discuss
Native ATLAS and Theory applications require a CVMFS configuration update
Volunteers running ATLAS native and/or Theory native are kindly asked to update their local CVMFS configuration. Please see the following post for the details.
5 Jul 2019, 8:06:20 UTC · Discuss
Our BOINC servers were unavailable from 13:45 to 15:30 CET this afternoon due to a problem with a shared storage cluster. This explains possible download/upload errors from your clients.
Sorry for the trouble and happy crunching.
26 Jun 2019, 13:53:13 UTC · Discuss
killing extremely long SixTrack tasks
we had to kill ~10k WUs named:
Using a local proxy to reduce network traffic for CMS
Thanks to computezrmle, with additional work from Laurence and a couple of CMS experts (and my adding one line to the site-local-config file) there is now a way to set up a local caching proxy to greatly reduce your network traffic. Each job instance that runs within s CMS BOINC task must retrieve a lot of set-up data from our database. This data doesn't change very often, so if you keep a local copy the job can access that rather than going over the network every time.
Instructions on how to do this are available at https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.phpp?id=475&postid=6396 or https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5052&postid=39072
7 Jun 2019, 14:24:45 UTC · Discuss
new exes for SixTrack 5.02.05
we are pleased to announce the release to production (SixTrack app) of new exes for the current pro version (v5.02.05). We have new exes for FreeBSD (avx/sse2), an exe for XP hosts (32bits), an aarch64 executable for Linux, and one for Android. Many thanks to James, Kyrre and Veronica for finding the time to produce them.
Distributing an exe compatible with XP hosts is not a way to encourage people to stay with unsupported OSs, but rather a trial to have a smooth transition to more recent OSs. In this way, people with XP hosts do not miss the possibility to contribute to the present wave of SixTrack tasks (expected to be quite long) while considering options for upgrading their hosts. At the same time, we are looking into preparing 32bits Linux exes. It should be noted that all Win exes are distributed without targeting specific kernel versions - hence, XP hosts may receive tasks with regular Windows exes immediately failing, but the BOINC server should quickly learn that the XP-compatible exe is the appropriate one.
We are also very happy to start involving freeBSD and Android users in our production chain. For the latter platform, the present exe won't run on Android versions >=8 - James is still looking into this. Since the android version filtering needs a fix on the scheduler side:
we labelled the Android exe as beta. Hence, Sixtrack beta users with Android 8 and later should not request tasks for that host or untick the test applications flag in their LHC@home project preferences.
We are pursuing also the generation of MacOS exes, and we should test them soon on sixtracktest.
Thanks for your continuous support and help,
Alessio, for the SixTrack team
4 Jun 2019, 10:17:02 UTC · Discuss
2019 BOINC Pentathlon is over - a big thank you from the SixTrack team!
the 2019 pentathlon is over, and we would like to thank all the participants for having crunched our tasks! We saw the BOINC CPU capacity almost doubled, boosting our calculations, even though it was only for few days. We are very grateful for that!
The SixTrack team would also like to thank all you volunteers who regularly support us with your CPUs. You give us the possibility to deepen our understanding of the dynamic aperture, a quantity of paramount importance for the stability of particle beams in big research accelerators like superconducting colliders - last but not least a very recent paper on the most important journal in the field of accelerator physics, comparing simulations and measurements:
where simulation results have been obtained thanks to you and BOINC!
A lot has been already done with your help, but a lot more has still to come in the next future. We count on your support!
Keep up the good work,
Alessio and Massimo, for the SixTrack team
20 May 2019, 8:00:15 UTC · Discuss
BOINC Pentathlon - Sixtrack sprint
We are very grateful to have been chosen for the BOINC Pentathlon of SETI Germany over the next days. For this, the Sixtrack team has submitted a huge backlog of jobs, and our servers will primarily distribute Sixtrack tasks over the next days. There will only be drip-feed of other applications for now until our backlog is reduced. For fans of other applications, stay tuned or run Sixtrack for a few days.
15 May 2019, 14:30:26 UTC · Discuss
CMS -- Please set "no new tasks"
Hi to all CMS-ers. We need to drain the job queue so that a new version of the WMAgent can be installed.
Can you please set No New Tasks so that your current tasks can run out and no new jobs start? If you have any tasks waiting to run, please suspend or abort them.
Thanks, I'll let you know as soon as the change is done.
14 May 2019, 14:40:22 UTC · Discuss
We are having database problems and have to schedule an intervention at 3:30pm UTC. The LHC@home servers are back again. We may have some irregular dispatching of some applications over the next hours.
13 May 2019, 13:01:57 UTC · Discuss
Native Theory Application (TheoryN) Released
The Native Theory Application for Linux has moved out of Beta status and is now generally available. It is similar to the Native ATLAS application in that it requires CVMFS to be installed locally but does not require Singulariy as it uses Linux Containers (runc). To setup your machine for this application please follow the instructions.. Even if the Native ATLAS tasks are running successfully, follow the instructions to ensure that CVMFS is configured correctly for both and that Linux Containers are enabled. This is a new application (TheoryN) rather than an alternative version of the Theory application as they have different resources requirement. If there are any issues, please post them to the Theory messages board.
13 May 2019, 8:07:13 UTC · Discuss
new SixTrack version 5.02.05 released on BOINC for production
after a long period of development and testing, we are pleased to announce that we have on BOINC a new major release of SixTrack. The development team made an impressive job to re-factorise the code, porting arrays to dynamic memory allocation, splitting the source code (gathered in few, huge source files) into fortran90 modules, making maintenance easier and deleting a lot of duplicated code and massive arrays - without mentioning countless bug fixes, documentation updates, re-written input parsing, improved build system and test suite.
We have also implemented plenty of new features. Most of them are still available only on the batch system as CERN (e.g. linking to Geant4 or Pythia, running coupled to FLUKA or other external codes, support for ROOT and HDF5), but many of them can be already deployed by BOINC jobs, like on-line aperture checking, electron lenses, generalised RF-multipoles, quadrupole fringe fields, and hashing of files for checks. All these new features will allow us to study new machine configurations and refine results, and we count on your help!
Thanks again for your support, and keep up the good work!
Alessio, for the SixTrack Team
3 May 2019, 16:01:25 UTC · Discuss
Problem writing CMS job results; please avoid CMS tasks until we find the reason
Since some time last night CMS jobs appear to have problems writing results to CERN storage (DataBridge). It's not affecting BOINC tasks as far as I can see, they keep running and credit is given. However, Dashboard does see the jobs as failing, hence the large red areas on the job plots.
Until we find out where the problem lies, it's best to set No New Tasks or otherwise avoid CMS jobs. I'll let you know when things are back to normal again.
18 Apr 2019, 15:44:45 UTC · Discuss
The batch I submitted last night is now showing on the monitor, so you can resume tasks at will.
23 Mar 2019, 17:56:25 UTC · Discuss
Warning: possible shortage of CMS jobs - set No New Tasks as a precaution
There was an intervention (i.e. upgrade) yesterday afternoon on the cmsweb-testbed system we use to submit CMS workflows that left things a bit confused. One problem was fixed, and the monitor shows all good. However, we are running out of CMS jobs -- maybe 10 hours left -- but the new batch I submitted yesterday isn't showing up on the testbed monitor. I submitted another last night but still neither are being shown this morning, so I submitted yet another batch.
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.
 How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!?
23 Mar 2019, 11:31:44 UTC · Discuss
BOINC Open Source Project Looking for Experienced Macintosh Developers
The Berkeley Open Infrastructure for Network Computing (BOINC) system is the software infrastructure used by LHC@home and many other volunteer distributed computing projects. The BOINC Open Source Project is looking for volunteers to develop and maintain the BOINC client on Macintosh. The BOINC Client and Manager are C++ cross-platform code supporting MS Windows, Mac, Linux, and several other operating systems. We currently have a number of volunteer developers supporting Windows and Linux, but our main Mac developer is winding down his involvement after many years. He is prepared to help a few new Mac developers get up to speed.
If you have Mac development experience and are interested in volunteering time to help support and maintain the BOINC Mac client please have a look at the more detailed description here: https://boinc.berkeley.edu/trac/wiki/MacDeveloper
If you are not a Mac developer, but have other skills and are interested in contributing to BOINC, the link above also has more general information.
14 Feb 2019, 7:54:10 UTC · Discuss
Consent required to export statistics
Following the implementation of GDPR compliance with BOINC, user consent is now required to export BOINC statistics from LHC@home to BOINC statistics sites, such as BOINC stats.
To grant your consent, please login to the LHC@home site and update your project preferences. Once logged on to the LHC@home site, please navigate to the Project Preferences page.
Click on "Edit preferences" and then tick the box on the line:
"Do you consent to exporting your data to BOINC statistics aggregation Web sites?"
This will enable continued export of statistics from LHC@home for your BOINC user account. If you leave the box unchecked, statistics should no longer be exported.
Thanks for your contributions to LHC@home!
9 Jan 2019, 16:10:34 UTC · Discuss
Seasons greetings from LHC@home
We in the LHC@home team wish you all a Merry Christmas and Happy New Year!
Our warm thanks to all of you for your contributions to LHC@home!
21 Dec 2018, 14:21:41 UTC · Discuss
The LHC@home BOINC servers will be upgraded to the latest BOINC server release Tuesday morning at 8AM GMT. BOINC services like upload/download and task validation and assimilation will be paused for about 1 hour during the intervention to update our servers.
10 Dec 2018, 15:43:39 UTC · Discuss
Pausing submission of LHCb Applications
Dear BOINC Volunteers,
LHCb has been very grateful to the BOINC community in the past years for their support and provisioning of computing resources to run LHCb simulation jobs. Since the start of the service for LHCb you have provided computing resources that allowed us to execute a fantastic amount of 3.1 Million successful jobs which simulated 142'740'087 events. This work considerably contributed to the work of the experiment. Many thanks to you all !!!
Despite this success we have also observed that the work in connection to BOINC operations has grown in the past within the LHCb computing project and after internal discussions we have decided to pause the operations of the service and therefore not to run LHCb applications via BOINC for the time being with the possibility to re-open the service in the future.
Please note that the possibility to contribute computing resources to other BOINC projects stays untouched by this decision and we would like to encourage you to continue supporting also the other projects represented via the LHC@home BOINC service.
For now I would like to re-state my thanks to you, the BOINC community, for your support.
Dr. Stefan Roiser
LHCb Computing Project Leader
19 Nov 2018, 15:26:59 UTC · Discuss
In spite of the break and lack of simulation work things are moving behind the scenes! Most of the trackers have been busy with the preparation of and attendance to the HiLumi annual collaboration meeting. For instance:
* new scanning parameters for DA studies, to shed some light on open points concerning the different behavior of the two beams in the LHC:
* an update of DA results to the latest developments on HL-LHC optics:
The collaboration meeting is the most important event of the large collaboration, led by CERN, that is designing and building the High-Luminosity upgrade of the LHC. This is not only a forum to present and discuss recent results, but also an event that inspires new ideas and studies. Therefore, we would like to announce that in few weeks we will be back to you, counting on your usual fantastic and essential support, to launch new simulation campaigns!
Alessio and Massimo, for the SixTrack team
19 Oct 2018, 8:17:24 UTC · Discuss
Unexpected server downtime
Due to a failure on part of our computing infrastructure that also prevented our fail-over mechanism to work, the LHC@home web server was unavailable until this morning. Sorry for this, and thanks for your contributions to our project.
4 Oct 2018, 8:09:13 UTC · Discuss
test of SixTrack 5.00.00
we are in the process of testing a new sixtrack version, i.e. 5.00.00. This is a true upgrade of the code, which has been re-factored deeply - including dynamic memory allocation. Moreover, it provides fixes to the physics already implemented, e.g. solenoidal fields and online aperture checking, and brand new implementations, e.g. electron lenses and ion tracking. We are finalising the implementations, hence the version running as sixtracktest is a quick test of the main functionalities and code re-factoring.
More to come in the next days / weeks.
Thanks a lot for your precious help!
Keep up the good work, and happy crunching!
Alessio, for the SixTrack team
27 Jun 2018, 7:13:28 UTC · Discuss
CMS production pause
We have run into a problem with the CMS project -- the merged result files processed at CERN are failing to be written to central storage. Consequently I have decided not to submit any more jobs until the experts have clarified what the problem is. The CMS jobs queue is about to start draining and I expect it to be empty of volunteer jobs within a few hours (there may still be post-production jobs, but these run at CERN, not on your machines). I suggest you set No New Tasks or transfer to another project until the situation is resolved.
23 Apr 2018, 15:50:52 UTC · Discuss
Server upgrade - file uploads paused
We will change the storage back-end on our BOINC servers today, and the file servers will be disabled during the operation.
Hence your BOINC clients will get not be able to upload or download files from LHC@home today for a few hours. Once our maintenance operation is finished, BOINC clients will be able to upload again.
Thanks for your understanding and happy crunching!
26 Mar 2018, 6:23:52 UTC · Discuss
Theory application reaches 4 TRILLION events today !!
LHC@home's Theory application today passed the milestone of 4 TRILLION simulated events. This project, under its earlier name "Test4Theory", began production in 2011 and was the first BOINC project to use Virtual Machine technology (based on CERN's CernVM system).
We will be publishing some more details for you on the LHC@home and CERN websites over the coming days. Here is a first release:
Many thanks to all our volunteers for enabling this achievement !
14 Mar 2018, 5:37:23 UTC · Discuss
CMS Job queue draining
Due to a problem with the WMAgent submission task, a new batch of CMS jobs is not being put in the Condor queue. So, the queue is now draining and there will be no more jobs available in a couple of hours. Best to set your BOINC instance to No New Tasks if you can, to avoid spurious compute error terminations.
22 Feb 2018, 22:10:56 UTC · Discuss
Task creation delayed - database maintenance
Due to a database issue last week, task generation is delayed and we need to clean up stuck workunits. The project daemons will be on and off this morning while we try to debug a problem with the BOINC transitioner.
5 Feb 2018, 8:16:44 UTC · Discuss
Thanks for supporting SixTrack at LHC@Home and updates
All members of the SixTrack team would like to thank each of you for supporting our project at LHC@Home. The last weeks saw a significant increase in work load, and your constant help did not pause even during the Christmas holidays, which is something that we really appreciate!
As you know, we are interested in simulating the dynamics of the beam in ultra-relativistic storage rings, like the LHC. As in other fields of physics, the dynamics is complex, and it can be decomposed into a linear and a non-linear part. The former allows the expected performance of the machine to be at reach, whereas the latter might dramatically affect the stability of the circulating beam. While the former can be analysed with the computing power of a laptop, the latter requires BOINC, and hence you! In fact, we perform very large scans of parameter spaces to see how non-linearities affect the motion of beam particles in different regions of the beam phase space and for different values of key machine parameters. Our main observable is the dynamic aperture (DA), i.e. the boundary between stable, i.e. bounded, and unstable, i.e., unbounded, motion of particles.
The studies mainly target the LHC and its upgrade in luminosity, the so-called HL-LHC. Thanks to this new accelerator, by ~2035, the LHC will be able to deliver to experiments x10 more data than what is foreseen in the first 10/15y of operation of LHC in a comparable time. We are in full swing in designing the upgraded machine, and the present operation of the LHC is a unique occasion to benchmark our models and simulation results. The deep knowledge of the DA of the LHC is essential to properly tune the working point of the HL-LHC.
If you have crunched simulations named "workspace1_hl13_collision_scan_*" (Frederik), then you have helped us in mapping the effects of unavoidable magnetic errors expected from the new hardware of the HL-LHC on dynamic aperture, and identify the best working point of the machine and correction strategies. Tasks named like "w2_hllhc10_sqz700_Qinj_chr20_w2*" (Yuri) focus the attention onto the magnets responsible for squeezing the beams before colliding them; due to their prominent role, these magnets, very few in number, have such a big impact on the non-linear dynamics that the knobs controlling the linear part of the machine can offer relevant remedial strategies.
Many recent tasks are aimed at relating the beam lifetime to the dynamic aperture. The beam lifetime is a measured quantity that tells us how long the beams are going to stay in the machine, based on the current rate of losses. A theoretical model relating beam lifetime and dynamic aperture was developed; a large simulation campaign has started, to benchmark the model against plenty of measurements taken with the LHC in the past three years. One set of studies, named "w16_ats2017_b2_qp_0_ats2017_b2_QP_0_IOCT_0" (Pascal), considers as main source of non-linearities the unavoidable multipolar errors of the magnets, whereas tasks named as "LHC_2015*" (Javier) take into account the parasitic encounters nearby the collision points, i.e. the so called "long-range beam-beam effects".
One of our users (Ewen) is carrying out two studies thanks to your help. In 2017 DA was directly measured for the first time in the LHC at top energy, and nonlinear magnets on either side of ATLAS and CMS experiments were used to vary the DA. He wants to see how well the simulated DA compares to these measurements. The second study seeks to look systematically at how the time dependence of DA in simulation depends on the strength of linear transverse coupling, and the way it is generated in the machine. In fact, some previous simulations and measurements at injection energy have indicated that linear coupling between the horizontal and vertical planes can have a large impact on how the dynamic aperture evolves over time.
In all this, your help is fundamental, since you let us carry out the simulations and studies we are interested in, running the tasks we submit to BOINC. Hence, the warmest "thank you" to you all!
Happy crunching to everyone, and stay tuned!
Alessio and Massimo, for the LHC SixTrack team.
23 Jan 2018, 17:08:14 UTC · Discuss
LHC@home down-time due to system updates
Tomorrow Wednesday 24/1, the LHC@home servers will be unavailable for a short period while our storage backend is taken down for a system update.
Today, Tuesday 23/1, some of the Condor servers that handle CMS, LHCb and Theory tasks will be down for a while. Regarding the on-going issues with upload of files, please refer to this thread.
Thanks for your understanding and happy crunching!
23 Jan 2018, 9:19:32 UTC · Discuss
Short interruptions Tuesday
There will be a couple of short server outages while our BOINC service pass to fail-over nodes today, Tuesday 16th of January. Similar interruptions will happen next week, as we carry out security updates on our computing infrastructure.
16 Jan 2018, 7:05:48 UTC · Discuss
File upload issues
Our NFS storage backend got saturated and hence uploads are failing intermittently.
The underlying cause is an issue with file deletion, we are trying to resolve that.
Sorry for the trouble and thanks for your patience with transfers to LHC@home.
8 Jan 2018, 9:25:05 UTC · Discuss
Increased file server capacity
Since Tuesday evening, we have had intermittent issues with upload failures due to a combination of a large number of new hosts running BOINC that co-incidentally joined at the same time as larger ATLAS tasks had been introduced. Our file server capacity has been increased and backlog tasks waiting for upload should upload again soon. (Please refer to the ATLAS application and Number crunching forums for more details.)
14 Dec 2017, 10:05:32 UTC · Discuss
Due to a too aggressive spam-cleaning campaign (our fault), we have accidentally deleted some valid accounts on Monday. We have restored a backup copy of the BOINC database, and will recover the missing account data.
If you get a message "missing account key" from your BOINC client, you may be affected. We expect that we can fix this later today, once we have verified the data sets. Hence there is no need to register again.
My apologies for this mishap.
13 Dec 2017, 8:46:45 UTC · Discuss
BOINC server update Thursday
We will upgrade the LHC@home web servers to the new BOINC server code with the "Bootstrap" theme on Thursday 7 December. The new style and layout can already be seen on the LHC@home development project..
During the intervention, from 09 UTC on Thursday, there may be intermittent availability of the LHC@home servers, so BOINC clients may back off and try to upload data later.
5 Dec 2017, 9:07:45 UTC · Discuss
Phaseout of legacy site: lhcathomeclassic.cern.ch/sixtrack
LHC@home has been consolidated and uses SSL for communication as mentioned in this thread last year.
Some BOINC clients are still connecting to the old lhcathomeclassic.cern.ch/sixtrack address, that will be phased out soon.
If this is the case for you, please re-attach the project to the current LHC@home URL presented in the BOINC manager. (http://lhcathome.cern.ch will redirect your BOINC client to https://lhcathome.cern.ch/lhcathome )
For those who are still running an old BOINC 6 client, please upgrade to BOINC 7.2 or later. (The current BOINC client releases are 7.6.33 or 7.8.)
Many thanks for your contributions to LHC@home!
29 Sep 2017, 13:50:32 UTC · Discuss
CMS jobs unavailable Weds 27th September
An upgrade to the CMS@Home workflow management system (WMAgent) is planned for tomorrow (Wed Sep 27th). This needs the current batch of jobs to be stopped so that the queue is empty. I plan to do this about 0700-0800 UTC on Wednesday.
To avoid "error while computing" task failures and the resulting back-off of your daily quotas, we suggest you set all your CMS machines to No New Tasks at least 12 hours beforehand to allow current tasks to time out in the normal way. You can stop BOINC once all your tasks are finished, if you wish.
Exactly how long the intervention will take is unclear, and there will be a delay of up to an hour to get a new batch of jobs queued afterwards. I will post here when jobs are available again, hopefully before the end of the day European time.
26 Sep 2017, 12:23:21 UTC · Discuss
Possible systems failures
We seem to be in the early stages of a system failure for several sub-projects. The proxy server has flatlined and my running jobs monitor is dipping alarmingly. Please check if you are getting tasks flagged as computing failures, and set No New Tasks if so.
[Edit] On closer inspection, it may just be the CMS app. [/Edit]
Obviously I'll apologise if this is a false alarm, but it's the wrong time of day to expect a prompt response from the CERN admins.
18 Sep 2017, 22:00:28 UTC · Discuss
New SixTrack exes
After testing them as the sixtracktest app, we have just pushed out executables to the sixtrack app. For the moment, we have exes only targeted for the main OSs, i.e. Windows, Linux, and the brand new one for MacOS. We are still finalising the definition of the plan classes with the sixtracktest app for targeting Android, freeBSD, and Arm CPUs - e.g. see
Thanks a lot for your contribution and ... happy crunching!
Alessio, for the sixtrack team
5 Sep 2017, 16:40:44 UTC · Discuss
Deadline change for ATLAS jobs
Due to the tight deadline of the ATLAS tasks, we change to deadline of ATLAS jobs from 2 weeks to 1 week. The ATLAS job takes about 3-4 CPU hours to finish on a moderate CPU (2.5GFLOPS).
16 Aug 2017, 8:42:03 UTC · Discuss
New ATLAS app version released for Linux hosts
We released a new version of the ATLAS app today, 2.41 for the x86_64-pc-linux-gnu platform.
The new features of this version include:
1. It requires the host OS to be either Scentific Linux 6 or Cent OS 7.
2. It require CVMFS and Singularity instead of Virtualbox to run the ATLAS jobs.
3. It is more efficient, as the avoidance of using Virtualbox.
Currently, this version is set to beta version.
For people who want to try it out,we provide a script to install everything including CVMFS, singularity here,
Try it if you are interested!
10 Aug 2017, 11:04:25 UTC · Discuss
CMS Weekend problem
Warning: The WMAgent which controls CMS jobs appears to have a failed component very recently. Queue seems to be exhausted. Please set No New Tasks or change to a backup app while I try to raise someone at CERN to fix it. This could be a problem given that this is expected to be the heaviest weekend of the year for holiday travel in Europe...
5 Aug 2017, 4:31:20 UTC · Discuss
Optimising distribution of SixTrack tasks
we are trying to improve the distribution of SixTrack tasks. If your host could process more tasks but during the project update you don't receive any, can you let us know and send us your client logging report? Please continue the thread "SixTrack Tasks NOT being distributed" opened by Eric here:
so that we can collect all the issues in only one place. In this way, we could try to better tune parameters controlling the distribution of tasks on the server side.
At the same time, we apologize for the loss of credits following the accidental deletion of lines in the main DB - please see message:
As you can see, task distribution is progressing regularly since the beginning of the week
Thanks in advance for your precious cooperation,
Alessio and Riccardo, for the SixTrack team
27 Jul 2017, 12:44:17 UTC · Discuss
Aborted Work Units
After deleting many really old results from 2013 until March 2017 (was meant to
be December 2016) it seems many Tasks have been aborted. A full analysis
and report will be posted. No action required by volunteers. Eric.
24 Jul 2017, 12:26:54 UTC · Discuss
CMS Jobs working again
It's been a few hours now since the Data Bridge appears to have been fixed and jobs are staging out normally. You can resume running CMS tasks at your will.
18 Jul 2017, 17:16:35 UTC · Discuss
CMS@Home -- please set No New Tasks and perhaps temporarily run another project
There is a problem staging-out CMS@Home jobs to the Data Bridge. Until we find the cause, please set your CMS crunchers to No New Tasks, or temporarily move them to another app or project.
Sorry for the trouble, unfortunately it's beyond my capability to resolve.
16 Jul 2017, 21:37:26 UTC · Discuss
No RESULTS accepted from Linux Kernel 4.8.*
As an emergency measure and over the weekend, I have set
max_results_day to -1 for all hosts running Linux (Ubuntu?)
Kernel 4.8.*. SixTrack is consistently crashing with an IFORT run
time formatted I/O error. This will avoid wasting your valuable
14 Jul 2017, 13:50:37 UTC · Discuss
IMPORTANT, pull back on SixTrack Inconclusive Results
Please see Message 31102 on SixTrack Application,
Inconclusive Results, keyword IMPORTANT. Eric.
26 Jun 2017, 18:08:03 UTC · Discuss
CMS application job queue is being run down.
We want to update the WMAgent job controller, so I've stopped the next batch (I hope). We should run out of jobs in 10-12 hours, so set any machine running CMS tasks to No New Tasks as soon as practicable. Should be up again tomorrow.
26 Jun 2017, 15:59:30 UTC · Discuss
SixTrack Inconclusive Results
Please see the SixTrack Application threads for an important update,
Message 31064, Keyword BANNED
26 Jun 2017, 6:24:06 UTC · Discuss
There will be a (very) short interruption while I
install a new sixtrack_validator. Should fix null/empty
fort.10 and the nasty "outlier" problem.
See SixTrack Application, sixtrack_validator for more news and details.
24 Jun 2017, 9:21:04 UTC · Discuss
SixTrack Tasks distribution issues
Please see Message boards:SixTrack application, thread
"SixTrack Tasks NOT being distributed". This is to have one place for
all relevant messages. This thread is for SixTrack only.
My first post reports my personal status.
20 Jun 2017, 12:43:01 UTC · Discuss
Network and server problems Sunday night
We had a network problem in the computer centre at CERN last night, leading to a number of issues for our servers. BOINC servers should be back in business now.
Normally tasks should be correctly uploaded again on the next attempt. If you see any issues, please try an update or reset of the project.
Sorry for the trouble, and happy crunching!
19 Jun 2017, 7:19:47 UTC · Discuss
SixTrack News - May 2017
The SixTrack team would like to thank all the teams who took part in the 2017 pentathlon hosted by SETI.Germany:
where LHC@Home was chosen for the swimming discipline. The pentathlon gave us the possibility of carrying out a vast simulation campaign, with lots of new results generated that we are now analysing. While the LHC experiments send volunteers tasks where data collected by the LHC detectors has to be analysed or Monte Carlo codes for data generation, SixTrack work units probe the dynamics of LHC beams; hence, your computers are running a live model of the LHC in order to explore its potential without actually using real LHC machine time, precious to physics.
Your contribution to our analyses is essential. For instance, we reached ~2.5 MWUs processed in total, with a peak slightly above 400kWUs processed at the same time, and >50TFLOPs, during the entire two weeks of the pentathlon. The pentathlon was also the occasion to verify recent improvements to our software infrastructure. After this valuable experience, we are now concentrating our energies on updating the executables with brand new functionality, extending the range of studies and of supported systems. This implies an even increased dependence on your valuable support.
Thanks a lot to all people involved! We count on your help and committment to science and to LHC@home to pursue the new challenges of beam dynamics which lie ahead.
26 May 2017, 14:43:25 UTC · Discuss
LHCb application is in production
We are very happy to announce that the LHCb application is out of beta and is now in production mode on LHC@home. Thank you all for your precious contribution.
We are grateful to have you all as part of our project.
Please, refer to the LHCb application forum for any problem or feedback.
Thanks a lot
27 Apr 2017, 7:50:51 UTC · Discuss
New file server
We have added a new file server for download/upload to scale better with higher load. If there should be errors with download or upload of tasks, please report on the MBs.
Thanks for contributing to LHC@home!
26 Apr 2017, 14:00:48 UTC · Discuss
ATLAS application now in production
The ATLAS application is now in production here on LHC@home, after a period of testing. This marks another milestone for the LHC@home consolidation, and we would like to warmly thank all of you who have contributed to help and tests for the migration!
Please refer to Yeti's checklist for the the ATLAS application and the ATLAS application forum if you need help.
22 Mar 2017, 15:45:58 UTC · Discuss
Network interruptions 15th of March
Due to a network upgrade in the CERN computer centre, connections to LHC@home servers will intermittently time out tomorrow Wednesday morning between 4 and 7am UTC.
BOINC clients will retry later as usual, so this should be mostly transparent.
14 Mar 2017, 9:58:26 UTC · Discuss
VLHCathome project fully migrated
The former vLHCathome project has now been migrated here and the old vLHCathome project site has been redirected.
The credit has also been migrated as discussed in this thread.
If your BOINC client complains about a wrong project URL, please re-attach to this project, LHC@home.
Thanks again to all who contributed to vLHCathome and to those who contribute here!
-- The team
2 Mar 2017, 8:35:49 UTC · Discuss
Draining the CMS job queue
Because of an upgrade to the WMAgent server, we need to drain the CMS job queue. So, I'm not submitting any more batches at present and we should start running out over the weekend. If you see that you are not getting any CMS jobs (not tasks...) please set No New Jobs or stop BOINC.
I expect that the intervention will take place Monday morning, and hopefully we'll have new jobs again later that day.
17 Feb 2017, 10:57:18 UTC · Discuss
Good news for the CMS@Home application
This afternoon we demonstrated the final link in the chain of producing Monte Carlo data for CMS using this project (and the -dev project too, of course), namely the transfer of result files from the temporary Data Bridge storage to a CMS Tier 2 site's storage element (SE). To summarise, the steps are:
o Creating a configuration script defining the process(es) to be simulated
o Submitting a batch of jobs of duration and result-file size suitable for running by volunteers
o Having those jobs picked up by volunteers running BOINC and the CMS@Home application, and the result files returned to the Data Bridge
o Running "merge" jobs on a small cluster at CERN to collect the smaller files into larger files (~2.2 GB) -- this step has to be done at CERN as most volunteers will not have the bandwidth (or data plan!) to handle the data volumes required. This step also serves to a large extent as the verification step required to satisfy CMS of the result files' integrity.
o Transferring the merged files into the Grid environment where they are then readily available to CMS researchers around the world
Thanks, everybody. From here on it gets more political, but we've been garnering support as the project progressed. We now need to move into a more "production" environment and convince central powers-that-be to take over the responsibility of submitting suitable workflows and collecting the results. You will still see some changes in the future, especially as we bring some of the more-advanced features across here from the -dev project.
27 Jan 2017, 20:59:36 UTC · Discuss
MacOS executable OSX 10.10.5 Yosemite
Well I have finally got some work on my Mac with our new MacOS executable
built on OS X 10.10.5 Yosemite .
Please report to me email@example.com,
or to the Topic Sixtrack Application, MacOS executable thread,
if you get some work and there are problems. Eric.
19 Jan 2017, 10:41:43 UTC · Discuss
VM applications broken by the Windows 10 update KB3206632
The Windows 10 update KB3206632 introduces an issue that affects virtualization-based security (VBS) and hence may break VM applications. The issue is fixed in the update KB3213522. If you are running Windows 10, please ensure that you have applied the KB3213522 update.
Thanks everyone who contributed the treads on this issue.
Missing heartbeat file errors
Microsoft KB3206632 from 16/12/15
8 Jan 2017, 20:51:44 UTC · Discuss
A very Merry Christmas and a Happy New Year to all the LHC@home supporters.
(I shall send some news about our plans for 2017 in the next few days.)
25 Dec 2016, 8:33:19 UTC · Discuss
Following the Theory simulations added 1 week ago, we have now also deployed the CMS and LHCb applications from the Virtual LHC@home project here on the consolidated, original LHC@home.
Please note that in order to run VM applications in addition to the classic BOINC application Sixtrack, you need to have a 64bit machine with VirtualBox installed and virtualisation extensions (VT-x) enabled. The details are explained on the join us and faq pages on the LHC@home web site.
By default, only the Sixtrack application is enabled in your BOINC project preferences. If you have VirtualBox installed and wish to try VM applications as well, you need to enable other applications in your LHC@home project preferences.
Please note that if you run an older PC with Windows XP or similar, it is recommended to stay with the default; Sixtrack only.
Thanks for your contributions to LHC@home!
21 Nov 2016, 10:05:13 UTC · Discuss
As part of consolidation of LHC@home, we have setup a new server web front end using SSL for this project. The new URL is:
Please feel free to connect to the new site at your convenience. (BOINC 7.2 clients and later supports SSL.)
The old LHC@home classic site will continue operation as long as required. Currently there are no new Sixtrack tasks in the queue, but soon more applications and work will be available from this project.
6 Oct 2016, 10:56:02 UTC · Discuss
LHC@Home - SixTrack Project News
The members of the SixTrack project from LHC@Home would like to thank all the volunteers who made their CPUs available to us! Your contribution is precious, as in our studies we need to scan a rather large parameter space in order to find the best working points for our machines, and this would be hard to do without the computing power you all offer to us!
Since 2012 we have started performing measurements with beam dedicated to probing what we call the “dynamic aperture” (DA). This is the region in phase space where particles can move without experiencing a large increase of the amplitude of their motion. For large machines like the LHC this is an essential parameter for granting beam stability and allowing long data taking at the giant LHC detectors. The measurements will be benchmarked against numerical simulations, and this is the point where you play an important role! Currently we are finalising a first simulation campaign and we are in the process of writing up the results in a final document. As a next step we are going to analyse the second half of the measured data, for which a new tracking campaign will be needed. …so, stay tuned!
Magnets are the main components of an accelerator, and non-linearities in their fields have direct impact on the beam dynamics. The studies we are carrying out with your help are focussed not only on the current operation of the LHC but also on its upgrade, i.e. the High Luminosity LHC (HL-LHC). The design of the new components of the machine is at its final steps, and it is essential to make sure that the quality of the magnetic fields of the newly built components allow to reach the highly demanding goals of the project. Two aspects are mostly relevant:
The studies involve accelerator physicists from both CERN and SLAC.
Long story made short, the tracking simulations we perform require significant computer resources, and BOINC is very helpful in carrying out the studies. Thanks a lot for your help!
The SixTrack team
R. de Maria, M. Giovannozzi, E. McIntosh (CERN), Y. Cai, Y. Nosochkov, M-H. Wang (SLAC), DYNAMIC APERTURE STUDIES FOR THE LHC HIGH LUMINOSITY LATTICE, Presented at IPAC 2015.
Y. Nosochkov, Y. Cai, M-H. Wang (SLAC), S. Fartoukh, M. Giovannozzi, R. de Maria, E. McIntosh (CERN), SPECIFICATION OF FIELD QUALITY IN THE INTERACTION REGION MAGNETS OF THE HIGH LUMINOSITY LHC BASED ON DYNAMIC APERTURE, Presented at IPAC 2014
Y. Nosochkov, Dynamic Aperture and Field Quality, DOE review of LARP, FNAL, USA, July 2016
Y. Nosochkov , Field Quality and Dynamic Aperture Optimization, LARP HiLumi LHC collaboration meeting, SLAC, USA, May 2016
M. Giovannozzi, Field quality update and recent tracking results, HiLumi LHC LARP annual meeting, CERN, October 2015
Y. Nosochkov, Dynamic Aperture for the Operational Scenario Before Collision, LARP HiLumi LHC collaboration meeting, FNAL, USA, May 2015
26 Jul 2016, 8:37:55 UTC · Discuss
Disk Space Exceeded
I am sorry we have submitted some "bad" WUs.
They are using too much disk space.
Please delete any WUS with names like
16 Mar 2016, 6:15:12 UTC · Discuss
Server daemons temporarily stopped
Due to a problem with an underlying disk server, the BOINC daemons are temporarily shut down until the disk volume is back.
27 Feb 2016, 12:36:01 UTC · Discuss
Short server interruption 9-Feb.
Our LHC@home servers will be down for a short while from 8UTC 9-Feb. due to a disk server intervention. (Intervention postponed 1 week.)
2 Feb 2016, 8:27:29 UTC · Discuss
BOINC Server up
The server is back, for the moment at least.
Clearing backlog of results. Eric.
7 Dec 2015, 7:55:32 UTC · Discuss
The BOINC server has been stopped temporarily because of
file system problems at CERN. Hopefully to be restarted tomorrow
6 Dec 2015, 9:56:18 UTC · Discuss
Work/result buffering problem at CERN
We have had a BOINC CERN side buffer problem over the weekend.
It is being investigated and hopefully soon corrected. Eric.
16 Nov 2015, 9:49:17 UTC · Discuss
Another short service interruption
The LHC@home servers will be down for a short while from 6:30 UTC Tuesday 10th November for a database update.
9 Nov 2015, 7:51:18 UTC · Discuss
Service interruption tomorrow morning
LHC@home servers will be down for about 1 hour tomorrow morning from 6am UTC, due to an intervention on the database server.
8 Sep 2015, 9:07:23 UTC · Discuss
Server interruption 12 UTC
The BOINC server will be down for maintenance for about 30 minutes from 12:00 UTC today.
BOINC clients will back off and return results later once the server is up as usual.
Many thanks for your contributions to LHC@home!
24 Aug 2015, 6:36:43 UTC · Discuss
Brief Interruption, Thursday 18th June,2015
There will be a hopefully brief interruption to the service tomorrow
Thursday at 10:30 CST to provide separate NFS servers for SixTrack
and Atlas. The WWW pages should still be accessible and a further
message will be posted when the operation is complete. Eric and Nils.
17 Jun 2015, 16:22:48 UTC · Discuss
Project down due to a server issue
Due to a problem with an NFS server backend at CERN, the Sixtrack and ATLAS BOINC projects are down. A fix is underway.
11 Jun 2015, 9:42:59 UTC · Discuss
HostID 10137504 user aqvario
HostID 10137504 owner aqvario.
I set the max_results_day to -1; locking the stable door
after the horse has bolted. For some reason I cannot read the
messages I read this morning on this topic. Thanks for the
help and the Google translation. Eric.
6 Jun 2015, 14:12:35 UTC · Discuss
Quorom of 5, wzero and Pentathlon
I am currently running a set of very important tests to try and
find the cause of a few numerical differences between different platforms
and executables. I could/would not do this usually but because of your efforts
during the Pentathlon I have a unique opportunity. Also keeps up the
workload and gives you all an opportunity to get credits.
These test are wzero with a quorum of 5.Thanks. Eric.
17 May 2015, 14:32:09 UTC · Discuss
DISK LIMIT EXCEEDED
Please note that this may occur if you are also subscribed
to the LHC experiment projects ATLAS or CMS using vLHCathome.
A workround is to delete the remaining files yourself.
16 May 2015, 19:03:40 UTC · Discuss
New news on the BOINC Pentathlon
Please look at the NEWS 15th May, 2015 for latest update
involving the BOINC Pentathlon. Eric.
15 May 2015, 20:36:56 UTC · Discuss
News 15th May, 2015
As many of you know LHC@home has been selected to host
the Sprint event of the BOINC Pentathlon organised by
Seti.Germany. Information can be found at
The event starts at midnight and will last for three days.
This is rather exciting for us and will be a real test of
our BOINC server setup at CERN. Although this is the weekend
following Ascension my colleagues are making a big effort to
submit lots of work, and I am seeing a new record number of active WUs
every time I look. The latest number was over 270,000 and the Sprint
has not yet officially started.
We have done our best to be ready without making any last minute changes
and while this should be fun I must confess to being rather worried
about our infrastructure. We shall see.
We still have our problems, for a year now.
I am having great difficulties building new executables since Windows XP
was deprecated and I am now tring to switch to gfortran on Cygwin.
It would seem to be appropriate to use the free compiler on our
We are seeing too many null/empty result files. While an empty result can
be valid if the initial conditions for tracking are invalid, I am hoping
to treat these results as invalid. These errors are making it extremely
difficult for me to track down the few real validated but wrong results.
I have seen at least one case where a segment violation occurred, a clear
error, but an empty result was returned. The problem does not seem to
be OS or hardware or case dependent.
I am also working on cleaning the database of ancient WUs. We had not
properly deprecated old versions of executables until very recently.
I am currently using boinctest/sixtracktest to try a SixTrack which will return the full results giving more functionality and also allowing a case to be automatically handled as a series of subcases.
Then we must finally get back MacOS executables, AVX support, etc
Still an enormous amount of production is being carried out successfully
thanks to your support.
I shall say no more until we see how it goes for the next three days. Eric.
15 May 2015, 20:34:20 UTC · Discuss
Short stoppage for a disk intervention
The Sixtrack server will be down for a while this afternoon for a disk intervention. Clients will be able to upload results again soon.
30 Apr 2015, 12:27:31 UTC · Discuss
Upgrade of the look and feel of the SixTrack website
The http://lhcathomeclassic.cern.ch/sixtrack/ website has been brought up to date with a new look and feel, which is consistent the other LHC@Home projects. It maintains all the links and the functionality of the previous one.
23 Apr 2015, 12:20:31 UTC · Discuss
Status Result Differences 29th March, 2015
Please have a look at my lates post to:
Number Crunching/Host messing up tons of results. Eric.
29 Mar 2015, 16:30:47 UTC · Discuss
Server Intervention 10-Feb-2014
There will be a short server interruption on Tuesday 10-Feb-2014 from 14:00-15:00 CET for a hardware upgrade.
Update: The upgrade finished at 15:00 and the service is back up.
9 Feb 2015, 10:17:46 UTC · Discuss
Apologies; disk full problem. Cleaning up and hoping to
return to normal shortly. Thanks for all the messages. Eric.
29 Jan 2015, 16:27:52 UTC · Discuss
News, December, 2014.
Well not much news really. The project is ticking over
and we have processed a tremendous amount of work in 2014.
Right now we are trying to move the project to a new CERN IT
infrastructure so there may be a few hiccups in January
(CERN is closed for two weeks, but systems are up and running).
We are still using executables from May and I still don't have
a valid MacOS executable :-( , no heartbeat so something is really
wrong. Haven't found an explication for the "no permission/cannot acceess"
problems on Windows but the overall error rate is about 1.5% which
seems to be "normal". We have also had problems with the w- WUs
which produced a lot of output, now under control. However running
with a smaller number of pairs ro reduce volume of output seems
to give problems with validation. Working on this.
A New Year, so I shall try and make a big effort to get moving forward
as we have been pretty well stuck for 9 months; after ten years I am
a bit disappointed at the lack of progress. However, as usual, we must
maintain the service as top priority.
I have also noted increased interest from the experiments in using volunteer
computing and this may impact lhcahomeclassic......
Anyway, LHC is heading steadily to restart in the Spring, and we shall
continue studying the High Luminosity upgrade. Many thanks for your
patience and understanding and continued valued support.
A Very Happy New Year. Eric.
31 Dec 2014, 11:03:50 UTC · Discuss
I wish you a very Merry Christmas and
a Happ[y|ier] New Year. Thanks for all
your support (news to follow). Eric.
24 Dec 2014, 15:11:54 UTC · Discuss
Heavy I/O on Windows WUs
It sems WUs with names beginning w-.... are creating a bit
much I/O for Windows. Under investigation, but the results
are good and are required. Thanks. Eric.
31 Oct 2014, 19:58:01 UTC · Discuss
17:00 CET, 15th October, Service back to "normal".
I believe we have finally resolved various issues as
of about 16:00 today. Apologies for the downtime. Eric.
14 Oct 2014, 15:47:00 UTC · Discuss
CERN AFS problems
We seem to be having intermittent? problems with our local
file system. Server running but.....will fix soonest.
10 Oct 2014, 15:54:03 UTC · Discuss
Service back; 5th October
I think we are back in business. Lots of work coming, I hope,
once we sort out the disk space issue. Sorry for all the hassle
and thank you for your continued support.
5 Oct 2014, 13:01:11 UTC · Discuss
I have painfully cancelled all w-b3 WUs. According to doc they
stay in the database but are marked as "not needed".
I have also disabled further WUs of this type until we sort it out.
Hope to have saved some 65,000 valid WUs. We shall see tomorrow.
Please post to this thread if further problems (I have restarted as root...).
It will probably take some time to get back to normal.
Report will follow in due course.
4 Oct 2014, 17:58:49 UTC · Discuss
I have managed to stem the flood and disable the service.
Apologies and will inform as soon as we are started again.
4 Oct 2014, 8:42:20 UTC · Discuss
Disk Limit increased
I am unable to stop submission.
I have upped the limit on disk space to 500MB.
I can't do anything about active WUs but I hope the new limit
will suffice for new WUs. More news tomorrow.
4 Oct 2014, 0:02:30 UTC · Discuss
Disk Limit exceeded w-b3
Drastic action being taken to delete the download WUs.
This may crash the server....
Apologies for the wasted CPU.
3 Oct 2014, 22:35:08 UTC · Discuss
Power Supply Ripple
Asequesed and for your information Miriam has described her
recent studies as follows:
A principal component of the planned upgrade to a high luminosity LHC (HI-LHC) is the replacement of the high field quadropole magnets - the so called "inner triplet".
The long term beam stability can be significantly reduced by magnetic field errors, miasalignment of the magnets and by irregularities in the power supply (ripple). The recent batch of fifteen or so studies, involving over one and a half million cases or Work Units each of one million turns (for a stable beam), are aimed at determining the maximum allowable tolerances for the power supply ripple assuming the known field and alignment errors.
22 Jul 2014, 15:40:43 UTC · Discuss
More on DOWNLOAD
After running through the w- WUs I am now running
a few test jobs as I think the WUs may have been OK.
I cannot reproduce the problem (of course!) at CERN on my
Windows 7 system. Eric.
22 Jul 2014, 15:38:21 UTC · Discuss
Download Errors located.
ERR_DOWNLOAD problem located and there should be no more once this
batch of dud WUs has been cleared. May be Monday before
I can do anything else. Eric.
19 Jul 2014, 8:32:33 UTC · Discuss
Just noticed error rate has doubled to about 6% in
last 24 hours. Seem to be ERR_RESULT_DOWNLOAD which I
have confirmed my checking MBs right now. Any help/detailed
info welcome while I notify CERN support.
(Another Friday afternoon problem!) Eric.
18 Jul 2014, 14:24:19 UTC · Discuss
Three Problems, 22nd May.
Settling down a bit; I am seeing around 2% WU failures.
Problem 1: EXIT_TIME_LIMIT_EXCEEDED. Tried to minimise this
and will hopefully implement "outliers" to avoid it in future.
Problem 2: Can't Create Process and I will look for help on this.
Probably connected with our build but we shall see.
Problem 3: Found 545 invalid results involving 124 hosts.
One invalid result was duplicated! but i am not going to run
everything 3 times. Can live with this. The top 12 culprits gave
77 45 26 25 22 21 19 16 14 11 10 9 invalid results each.
(I thought we stopped using hosts with this many errors......)
Seems to be hardware, overclocking, cosmic rays?????
Getting a lot of production done successfully. Eric.
22 May 2014, 16:15:39 UTC · Discuss
Status, 19th May, 2014
Getting a lot of work done, but out of 400,000 WUs over the last seven days
still have about 8000 errors (2% and decreasing I think). The main problem
is EXIT_TIME_LIMIT_EXCEEDED but also "Can't create process". A side effect is
a mess up with credits. I have increased the fpops bound to help, I hope, and
today "reset credit statistics". Please be patient about credits and I shall see
what happens and if we can compensate somehow.
Unfortunately today I discovered a result difference, only one, but I need to
do more checking. I see no invalid results so the former Linux/Windoes
discrepancy is largely resolved. My priority is the integrity of the results
and I may have to spend some days pinning down the result difference,
checking various ifort versions, and doing more checks and tests.
We have a macOS executable under test.
Thank you for your patience, understanding and support. Eric.
(P.S. Getting correct identical results on any PC from a Pentium 3
to the latest, with a multitude of versions of Linux, Windows and macOS
is not easy! I can publish only when the LHC@home service is > 99%.
Afterwards GPU, Android, and 10 million turns)
19 May 2014, 20:07:05 UTC · Discuss
I am seeing about 1% CreateProcess problems mainly on Windows 7.
Most often Access Denied (in various languages :-).
Also some Access violation, page out of date or similar.
Found some BOINC mails about this. Under investigation.
Seems to be host dependent.
(More work coming sooon.) Eric.
16 May 2014, 10:20:54 UTC · Discuss
LHC@home is back
The service was restarted today and WUs should start
coming in, building up gradually. Thanks to all. Eric.
14 May 2014, 14:56:03 UTC · Discuss
First production tests, 11th May, 2014
Trying 590 WUs tonight. If all OK will restart full
production tomorrow 12th May. Eric.
11 May 2014, 19:10:14 UTC · Discuss
Status, 10th May
Please see MBs, Number Crunching, Status 10th May, Version 451.07
10 May 2014, 9:26:51 UTC · Discuss
WU Submission SUSPENDED 19th April, 2014
In order to avoid any further errors and waste of your valuable
resources I have temporarily stopped WU submission. There are only
a few thousand WUs active and when they are cleared I hope we will have
new Windows executables. Sadly the Windows executables are now giving
wrong results in many cases. I looked at using Homogeneous Redundancy
but I would still get wrong results. I thought of removing the Windows
executables but they are over 80% of our capacity. In this way I hope in
a few days after users and support return from vacation we can safely
introduce new Windows executables after tests using the BOINC test
facility. Sorry about that but I would rather get it fixed properly as we
have lots of new work coming.
Thankyou for your patience and support. Eric.
19 Apr 2014, 11:05:24 UTC · Discuss
Status, March 2014
First, in reply to a recent query about 2014 workload, thanks to Msssimo:
"The majority of the 2014 studies will be devoted to LHC upgrade and the rest to understand the nominal
machine. I do not expect any increase in workload when approaching the LHC re-start in 2015, on the
other hand, we will all be locked up in the control room and the resources for performing the
simulations will be reduced."
Second, we have been experiencing major problems with our
Windows executables for several months now.
There are "small" result differences between Windows and Linux.
After extensive testing I believe they are due to the Windows
ifort compiler. This will be verified and fixed as soon as I
return to CERN next week. In addition new builds of SixTrack
for Windows, which now include a call boinc_unzip, are failing
on Windows in at least two ways; there is a problem parsing the
hardware description (/proc/cpuinfo on Linux) and secondly we
get "cannot Create Process" errors. So, we shall first try and
build without the hopefully resposible call, and fix the result
differences. We can then resume development of the case splitting
to smaller WUs and the return of all results.
It is great that your support continues and, when required, we have
lots of capacity. Saw a new record of over 140,000 WUs in
process a couple of weeks ago. Eric.
16 Mar 2014, 8:43:45 UTC · Discuss
Status, 24th January, 2014
Hope this will answer some of your messages.
We still have some 34,000 WUs NOT being taken. We have apparently
almost 6000 in progress.
We introduced SixTrack Version 4.5.03 on Wednesday 22nd
January after extensive testing on boinctest and at CERN.
Unluckily Yuri flooded us with work at the same time
and AFS blew up leading to a huge backlog of over 16,000
results to be downloaded.
1. Results Validation;seems to be OK. I summarise that,
countimg from 0-59 we do NOT CHECK Words 51, 59? and 60
The validator log shows many many "cannot open" supposedly
existing results for comparison. They were probably lost
2. Assimilation; the log shows
"Herror too many total results" !!!
There are about 2000 (1979) unique messages and cases/WUs.
I suspect we may nedd to clean the database and remove results
(with clients losing credit I am afraid, but they will probably never
get credit for these anyway).
I could delete them from upload but that would probably be worse.
3. Scheduler log: there are about 2.4 million messages of which
there are 1.64M unrecognised messages, multiple messages per WU.
This is perhaps significant!
previously these messages existed only for Macs as far as I can see.
here is one case:
2014-01-22 17:24:41.1073 [PID=51877] HOST::parse(): unrecognized: opencl_cpu_prop
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: platform_vendor
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: Advanced Micro Devices, Inc.
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: /platform_vendor
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: opencl_cpu_info
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: name
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: /name
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: vendor
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: GenuineIntel
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /vendor
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: vendor_id
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 4098
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /vendor_id
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: available
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 1
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /available
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: half_fp_config
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 0
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /half_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: single_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 191
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /single_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: double_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 63
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /double_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: endian_little
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 1
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /endian_little
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: execution_capabilities
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: 3
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: /execution_capabilities
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: extensions
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_kh
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: /extensions
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: global_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: 17029206016
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: /global_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: local_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: 32768
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: /local_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: max_clock_frequency
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: 3500
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: /max_clock_frequency
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: max_compute_units
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: 8
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: /max_compute_units
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: opencl_platform_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: OpenCL 1.2 AMD-APP (1348.5)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_platform_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: opencl_device_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: OpenCL 1.2 AMD-APP (1348.5)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_device_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: opencl_driver_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: 1348.5 (sse2,avx)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_driver_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_cpu_info
2014-01-22 17:24:41.1156 [PID=51877] HOST::parse(): unrecognized: /opencl_cpu_prop
2014-01-22 17:24:41.3583 [PID=51877] Request: [USER#221474] [HOST#10137513] [IP 220.127.116.11] client 7.2.33
2014-01-22 17:24:41.3880 [PID=51877] Sending reply to [HOST#10137513]: 0 results, delay req 6.00
2014-01-22 17:24:41.3880 [PID=51877] Scheduler ran 0.035 seconds
I am not an expert but it seems to me it might explain work not being taken.......
(but never saw this with boinctest!).
Other issue; one client reports "Cannot Create Process" mon Windows 7.
May or may not be significant.
Are executables 'signed" OK?
So all a bit complicated but hope to sort it (very) soon.
24 Jan 2014, 12:34:09 UTC · Discuss
Hiccup, today 23rd January
Apologies for an interruption to service.
Working on it. More news when corrected.
23 Jan 2014, 8:45:25 UTC · Discuss
The WWW page
has been updated by Massimo with new recent publications concerning LHC@home.
19 Nov 2013, 15:52:20 UTC · Discuss
News Status and Plans 19th November, 2013
Please see the MB Number Crunching for an update. Eric.
19 Nov 2013, 8:09:05 UTC · Discuss
Problem October 23rd Fixed
The permissions on the directory for the logs was wrong.
Corrected and results being uploaded. A fuller report and
a new Status and Plans will be issued soonest.
24 Oct 2013, 8:16:11 UTC · Discuss
Problems 23rd October, 2013
Sorry for the upload problems. Hope somebody here will
fix this soon. (I thought we had a new record number
of WUs in progress! :-) Eric.
23 Oct 2013, 17:24:26 UTC · Discuss
Status, 13th September, 2013
Still fighting to produce a good set of Linux executables.
Lots of work for Windows systems!
Created some notes on Numerical reproducibility
[url=http://cern.ch/mcintosh]CV and Notes on Floating-Point[url].
13 Sep 2013, 6:16:34 UTC · Discuss
Status 6th September
New thread as feedback is in several others.
I have resolved server out of space for the short term and
we will implement a proper fix soonest.
Issue remains with Linux executables I think. I have checked and
informed my colelagues. The ".exe" suffix is confusing but the pni
executables look OK (crash on my test machine without pni of
course, but OK on my modern one).We do not hae a MAC executable
Now things have settled down we pursue an analysis of the problem(s).
I do not want to go back because we urgently need the new physics in
Thanks for your patience and undersatnding Getting lots of results
6 Sep 2013, 12:18:44 UTC · Discuss
SixTrack CERN Version 4463 is now in production.
4 Sep 2013, 7:39:01 UTC · Discuss
Just running "last" tests. Hope to have new SixTrack tomorrow.
2 Sep 2013, 19:03:12 UTC · Discuss
Short Failing Work Units
We are tyring to use the test option of BOINC SixTrack project.
The very short WUs are failing. We have a fix and shall try agian
soon. More production to follow. Thanks for your patience.
1 Sep 2013, 6:00:46 UTC · Discuss
Staus and Plans, 30th August, 2013
Please see Message Boards: Number Crunching: Status and Plans 20th August, 2013
(Sorry about date!). Eric.
30 Aug 2013, 12:56:30 UTC · Discuss
May, 2013 update.
Server down (temporarily I hope). Trying to fix the "unzip" problem.
See my recent posts to Number Crunching: Status and Plans May 25th,
and Results Discrepancies for more info. Eric.
25 May 2013, 11:04:18 UTC · Discuss
More work coming now.
We have introduced a new SixTrack Version 4446 and I am resuming
production on an intensity scan as well as running more tests; usual
mixture of short/long run times. We are also trying to return more
results files to help identify problems. Thanks for your help as usual.
8 May 2013, 17:56:27 UTC · Discuss
Dynamic Aperture Tune Scan
after some few technical problem in the last few days, we are now ready to submit a first Tune Scan for the Dynamic Aperture study we are performing at CERN.
This simulations will give us a first hint on how the HighLuminosity upgrade for the LHC will work, and in particular the effect of the Beam-Beam interaction will be analysed.
This will be only the first bunch of simulations, because various scenario are possible for this upgrade, and we need to deeply investigate each one of them to decide which one is the one that better fit our requirements...so keep you machine ready to crunch!!
15 Mar 2013, 9:35:53 UTC · Discuss
Interruption for server update
There will be a short server interruption today for a software update. New jobs should come later once we have checked the software chain.
The update is now done. Thanks for your contributions and have a nice day!
10 Mar 2013, 9:07:24 UTC · Discuss
Due to spam activity, all forums apart from Questions & Answers: Getting Started now requires some BOINC credit to allow posting. If you are a complete newcomer, please check existing Questions & Answers first.
15 Feb 2013, 12:58:25 UTC · Discuss
There will be a pause for a week or two.
See the News (no ) "More work" thread for more info.
8 Feb 2013, 15:10:03 UTC · Discuss
Can't keep up but more work coming now.
3 Feb 2013, 4:33:59 UTC · Discuss
Great; as you will have seen running flat out on intensity scans, one million turns max.
Over 100,000 tasks running! CERN side infrastructure is creaking at the seams.
Will run down in a week or two to introduce a new SixTrack version (with suitable
31 Jan 2013, 11:07:55 UTC · Discuss
First tests 2013
Trying to run a few thousand cases from Scinetific Linux 6 (SLC6)
here at CERN. Eric.
11 Jan 2013, 12:09:25 UTC · Discuss
A Happy New Year
Thanks for all the support in 2012 (and before). Further delay due to a Power Cut
PC broken and the CERN annual closure for two weeks. Once again more detailed
information when I have recovered. So a Happy New Year and I am hoping for
an even better 2013.
6 Jan 2013, 11:39:19 UTC · Discuss
Problems/Status 28th November, 2012 and PAUSE
Discovered some problems with result replication! and run out of
disk space at CERN. There will be a pause, for a few days at least,
while I investigate and resolve. (Wil post details soonest to the
MB Number Crunching.) Eric
28 Nov 2012, 17:17:55 UTC · Discuss
Status, Thursday 15th November
Hiccup; mea culpa. On vacation and travelling since Tuesday
and ran out of disk space in BOINC buffer at CERN :-(
I think all is OK again now after corrective actions and more work
is on the way. Sorry about that. Eric.
15 Nov 2012, 7:44:40 UTC · Discuss
Status and Plans, Sunday 4th November
First service continues to run well; the first intensity scan is nearing completion with well over a million results in 15 studies successfully returned. Just a couple of hundred thousand more!
(Sadly no one study is complete but a couple are very close and I shall start post-processing and analysis soon. I am still reflecting on the thread "Number crunching; WU not being sent to another user".
This is not easy, trying to get studies complete, but keeping the system busy. I am the "feeder" and since in the end I need all the studies I am rather prioritising keeping WUs available.)
Just checked and we have over 80,000, yes eighty thousand WUs active and this is a new (recent) record.
Draft documentation of the User side is now available thanks to my colleague R. Demaria. If you are interested
and I hope you can access it (otherwise I shall put a copy to LHC@home).
Right now I hope to try new executables with new physics on our test server and I mght shortly appeal for some volunteers to help (and also to run a few more 10 million turn jobs). I do NOT want to risk the production service while it is running so smoothly.
Otherwise (At Last!) I shall start writing my paper on how to get identical results on ANY IEEE 754 hardware with ANY standard compiler
at ANY level of Optimisation. Thanks to all. Eric.
4 Nov 2012, 15:08:44 UTC · Discuss
Status and Plans, Saturday 29th September, 2012
All running very smoothly indeed. Just a problem with deadline scheduling which I hope we can discuss and resolve on Monday, especially with some feedback from the BOINC meeting in London.
Also some hiccups on the CERN AFS infrastructure.
I am now hoping to prioritise the writing of my paper on numeric results reproducibility but I am continuing to run work for the next weeks as described in my new thread "Work Unit Description"
in the Message Board "Number Crunching".
I am also pondering how to best handle "very long"
jobs bearing in mind your feedback.
And of course I shall try and keep you informed.
Thankyou for your continued support. Eric.
29 Sep 2012, 11:20:02 UTC · Discuss
Status, Sunday 9th September, 2012.
All running well still. One user reports "Maximum Elapsed Time Exceeded" though
on several, all? of his, WUs.
Still checking for MacOS results but no
further complaints at the moment.
I present some basic info.
There have been several changes to URLs and Servers outwith my control. The correct site is http:lhcathomeclassic.cern.ch/sixtrack/
This can indeed be found easily from LHC@home and then The Sixtrack Project (rather than Test4Theory). The current server is firstname.lastname@example.org.
I define "normal" WUs as 10**5/100,000 turns but remember all particles may be lost after an arbitrary number of turns, sometimes, even just a few turns at large amplitudes.
Long WUs are 10**6 or one million turns and very Long WUs
10**7 or 10 million turns, and who knows maybe one day 10**8 turns.
That depends on how the floating-point error accumulates and at which point the loss/increase of energy and loss of symplecticity invalidate the results. It will be exciting to find out.
For Functionality, Reliability and Performance.
While waiting for the LXTRACK user node and the second server for test and backup (I assume they will finally get approved!):
Functionality; adequate for the moment. It would be good to have a priority system, three levels.
1. Run first, after other Level 1.
2. Normal; queue after Level 1 and before Level 3.
3. Run only if No Level 1/2 tasks queued.
I am thinking in terms of running 10**7 jobs as a series of 10**6 jobs. This requires returning and submitting more data, the fort.6 output and the checkpoint/restart files as a minimum. This would be very good additional functionality in itself.
Reliability; pretty good but needs the backup server, LXTRACK, and less reliance on CERN AFS..
Should provide a quick test (1 or 2 minutes) to verify the node produces correct results without running the whole WU. This would not obviate result validation but would avoid wasting resources.
I could also provide a longer test on the WWW with canonical results that any volunteer could run if he suspects he has over-clocked or is getting results rejected.
Performance; pretty good now with SSE2, SSSE3, PNI or whatever.
Should implement GPU option. Should measure the cost of the numeric portability.
(Incidentally Intel are hosting a Webinar on this topic on Wednesday, but I guess it will address only Intel H/W.)
9 Sep 2012, 15:57:05 UTC · Discuss
Status, 2nd September, 2012
Well all seems to be running rather well as seen from the
CERN side. So I present the topics for review on Tuesday.
1. IT report on LXTRACK proposal (to greatly improve facilities for the
physicists including more disk space and much improved reliability).
2. Proposal for a second "test" server (to test very long jobs, to try returnig
the full results, without affecting the current service).
3. Project Status and open issues from the MBs:
a) More buffered work (user request).
b) Access to boinc01! Apparently some attempts to contact this obsolete service.
Could be WWW pointers or what.
c) HTTP problems, one user? (I need to send byte count and MD5 checksum.)
d) MacOS executable. Open issue; works for some people.
e) Deadline scheduling Seems that work is deleted because volunteers fear their
contribution will be wasted. But is this true? I have 99.999% results OK but how many
WUs were not credited............
f) GPU enabled SixTrack
4. A.O.B. including Date and time for a small party and the invitation list
to celebrate recent progress and the many helpful comments and suggestions.
2 Sep 2012, 15:34:14 UTC · Discuss
Status, 26th August, 2012
MacOS executable is working, for some at least.
I have queued 500,000 jobs, intensity scan,
while I clear the decks. Many thanks for all the
suggestions and comments on (very) long jobs.
26 Aug 2012, 13:47:17 UTC · Discuss
Very long jobs
I am now going to submit just a few hundred very
log 10**7 turn jobs to complete two studies.
I think this will be OK now; we shall see.
22 Aug 2012, 15:48:40 UTC · Discuss
Please see the Message Board Number Crunching, Thread Credits for some
hopefully good news from Igor.
20 Aug 2012, 19:15:57 UTC · Discuss
Status, 19th August, 2012.
All is running rather well; over 100,000 tasks queued, and over 56,000 running. I have a bit more work prepared, but badly need to do some analysis. After some flak, we have been receiving many messages of support and also a lot of help in identifying the problem with the MAC executable.
Igor has identified and corrected the problem with Credits and is still cleaning up and trying to repair.
(This was my fault; trying to run 10**7 turn jobs taking 80 hours.
However I can report that 99% of them have completed successfully,
and others are still active.)
The Mac executable issue may even be solved, but we need to watch for the next days still.
There may be a problem with Deadlines....we shall see.
I am waiting for PC support to install my NVIDIA TESLA, memory and upgraded power supply, and Linux. I am ready to install the software next and try Tomography. There is some interest in ABP especially for existing MPI applications. We shall see.
I have STILL NOT finished the SixDesk doc or prepared the tutorial.
I take this opportunity to outline the LXTRACK system: I hope IT support could fill in the details and do it.
The justification is that AFS limitations and problems have made life very difficult.
I have used my desk side pcslux99 (thanks to Frank who donated it) as a protoptype to run several hundred thousand jobs over the last few weeks.
Sadly I do not have the LSF commands like bjobs and bsub, as it as an old 32-bit machine, and I am NOT wanting to become a sysadmin again. It has almost 200GB of disk space of which I am using only 12% but increasing. Under this setup I have virtually no problems and do everything with the SixDesk scripts called from master scripts in acrontab entries.
LXTRACK should be a "standard" lxplus Virtual machine i.e. with LSF and CASTOR and SVN and AFS etc etc. BUT with at least a Terabyte of disk space NON AFS, /data, say. Only users in the AFS PTS Group boinc_users should be allowed to login.
(We could even create the /data/$LOGNAME directory for them.) How can we manage this space? Given the small number of cooperative users a script to monitor is probably adequate.
Processes shoul NOT be killed for exceeding CPU or real time limits.
Later, ideally, we could possibly create non_AFS buffers for communication with BOINC.
19 Aug 2012, 14:25:12 UTC · Discuss
(Re-)activated MacOS executable built on MacBook PRO.
Will be watching closely for errors. Eric and Igor.
16 Aug 2012, 15:13:42 UTC · Discuss
Status 12th August
All is running rather well from CERN side and I have initiated an intensity scan to run while I work a bit on the GPU. I have a real time deadline and I
must try this over the next two weeks. In spite of a couple of issues
with the CERN infrastructure I have still managed to queue over 90,000 Work Units as part of an Intensity scan (different bunch sizes and charge).
We are getting flak about credits or points. One obscene message I tried to hide, but the user said he got only
200 points for 80 hours when he expected at least 1000, and another user 62.70 points for 110 hours. So we lost a couple of volunteers, but we are also getting support with over 40,000 active Work Units.
There is also an issue with the real time deadline for my 10 million turn jobs.
I hope to fix the MAC executable next week with my colleague.
12 Aug 2012, 11:51:47 UTC · Discuss
Status, 12th August
Please see the NEWS Message Board.
12 Aug 2012, 11:43:55 UTC · Discuss
Status/Plans, 7th August 2012
First, many thanks for your continued support. From my/CERN side all has been running rather well and I am submerged by results.
I now need to take some time to analyse them. In particular to decide between the two methods of computing the beam - beam effect.
Then I shall probably submit several studies to do an intensity scan where I study the beam - beam effect depending on the size, and hence charge, of the accelerated bunch of particles.
At the same time, I must finish the documentation of the "user"
infrastructure so that my colleagues may easily use BOINC as they return from vacation. In addition I want to set up a dedicated "user" system "lxtrack" in order to provide disk space here and to try and keep up with the results as they are returned.
I have to look at the Deadline problem for 10**7 turn jobs.
I set a bound of 30 days for any WU....need to discuss with Igor is that is NOT what you see at home. Of course we really want a low bound to get results back quickly, but I also want to use older slower systems. We shall have to work out some sort of compromise. My attempt as 10**7 turns was probably a bit over the top, but I was keen to try it.
We hope/expect to produce a valid MAC executable this week. I also need to add some new "physics", new elements, to Sixtrack as provided by a colleague. (Also need to add modifications for "Collimation" but they are not relevant to BOINC.)
The next version should also support SSE4.1.
I was very pleasantly surprised to win an NVIDIA TESLA C2075.
The catch is that I have to use it and program it with OpenACC. There will doubtless be some hiccups intsalling the board and the necessary
(PGI) software. I shall in fact try my "Tomography" application which already runs in parallel using HPF or openMP. If that works I shall seriously consider a multi-threaded Sixtrack (using GPUs or not) by tracking many more particles in each Work Unit. Non-trivial but rather exciting. I am just at the ideas stage here, but.....it would of course use multiple threads on a multiple core PC as well. A dream?
Finally, I have to take time to publish my work on floating-point portability and reproducibility. I believe I might be the only person who gets identical bit for bit 0 ULP different results after many Gigaflops with 5 different Fortran compilers at different levels of optimisation.
7 Aug 2012, 15:18:31 UTC · Discuss
STOP PRESS: Trying a new prototype executable for MACs.
Built with ifort defaults on a macBook Pro (using sse3 I guess).
Eric and Igor.
16 Jul 2012, 14:29:09 UTC · Discuss
My colleague has cleaned database and I think that is the end of http errors etc etc.
I have submitted new work and I am always getting results anyway. There is still a whole
bag of worms around sse2 sse3 ssse3 pni and whatever, not helped by Intel's ifort
refusal to run optimised code on non-Intel hardware.
Igor has much improved version distribution and some people are getting "PNI"
versions. The important thing is that SSE2 upwards is much faster than the generic
version. Don't want to waste resources. All versions are completely numerically
portable (I hope so) but when panic is over I shall be looking at all rejected results
as I believe they are due to hardware failures (over-clocked?).
If all goes well I shall try and issue an update to whatever happened to lhc@home
In the meantime someone has changed the WWW pages, or whatever and I don't even know if you
can read this. All my bookmarks failed and usual start page NOT available.
Eric (from his new super MAC notebook pro, bought at great personal expense,
but have never had the time to set up. I am going to try and install BOINC now.)
11 Jul 2012, 17:58:09 UTC · Discuss
An exciting day; a new particle and maybe even the Higgs boson itself.
We have been busy preparing new executables for BOINC, including a MAC
Sadly we have run out of disk space and there are likely to be some hiccups
for the next few hours, hopefully not longer. We have three new executables for
both Windows and Linux: run anywhere, use SSE2, use SSE3. The run anywhere is
slow but every little helps. The executable for MAC requires at least SSE3 I
believe and the exact requirements are not well understood as I write.
I am currently running tests on as many types of hardware as I can.
The disk full situation can cause havoc and certainly explains why you have
not been able to get more work for the last hours.
More news as soon as we make some progress.
Thanks for your continued support which will to make an even better
LHC for 2015. Eric.
4 Jul 2012, 12:05:36 UTC · Discuss
Sixtrack server migration today
The Sixtrack BOINC project has been migrated to a new server today. If you should encounter any difficulties with the setup, please detach from the project and attach again.
BOINC and Sixtrack should be fully operational again from 2PM CET. (12:00 UTC)
Best regards, the BOINC service team.
5 Jun 2012, 10:55:57 UTC · Discuss