Posts by Richard Haselgrove

1) Questions and Answers : Unix/Linux : Setting up cvmfs (necessary for Cern experiments)(Linux)(mac also) Also if you have a boinc Ubuntu VM or docker VM (Have at least 8gb of dynamic ram available & swap if possible Set in the VM) (Message 47460) Posted 1 Nov 2022 by Richard Haselgrove Post: Preparing to test with ATLAS native, I followed the instructions on https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html Linux Mint 20.3 'Mate' succeeded, but Linux Mint 21 'Mate' failed. Not entirely clear why, but the verification checks couldn't find cvmfs. I had a similar problem trying to load BOINC v7.20.2 on Mint 21 using the LocutusOfBorg PPA. On that occasion, it reported that the security key validation/storage mechanisms had been tightened, and the key in the PPA could no longer be used - so I fell back to repo BOINC v7.18.1, which is running fine. Just reporting for info.
2) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42772) Posted 2 Jun 2020 by Richard Haselgrove Post: Does anyone here still have an Android device running v7.4.53 still attached to the system? Could they please confirm whether or not Nils Høimyr's fix has solved the problem for that specific version? We have a user at BOINC for whom the Rosetta fix appears not to have worked in the case of that specific (older) version.
3) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42728) Posted 1 Jun 2020 by Richard Haselgrove Post: If any Windows user, 64-bit only, is still affected by this, there is a hotfix v7.16.7 of BOINC available from https://boinc.berkeley.edu/download.php
4) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42685) Posted 30 May 2020 by Richard Haselgrove Post: Add NumberFields@home as another project affected. Unfortunately, opening ca-bundle.crt in Windows only shows the details for the first of the 133 certificates in the bundle. I've been through them all, and - although a few of them have expired - none expired this morning. Although the COMODO certificate authenticating this website, and the InCommon certificate authenticating the NumberFields and Rosetta websites, all seem to be in order, I've seen a suggestion on the web that certificates may be rejected as expired in some cases when a newer certificate is issued (even if the old one appears still to have time left to run before expiry).
5) Message boards : Number crunching : A sudden huge increase in computation errors (Message 27712) Posted 15 Mar 2016 by Richard Haselgrove Post: Your BOINC client might have run out of available disk space. If your PC has space on disk, you can allocate more to BOINC as shown here: https://boinc.berkeley.edu/wiki/Local_preferences No, I don't think that's it. exceeded disk limit: 651.56MB > 572.20MB That particular machine is nearly brand new, with a 1 TB data disk - it's reporting 928.88 GB free, and BOINC is allowed to use 100.00 GB of that. Other tasks from both LHC and other projects continued running normally. It's more likely that the wjt-15 tasks were given a workunit <rsc_disk_bound> of 600,000,000 - and exceeded even that. I hope it wasn't a result file the experimenter wanted uploading...
6) Message boards : Number crunching : A sudden huge increase in computation errors (Message 27702) Posted 14 Mar 2016 by Richard Haselgrove Post: Are these errors all from the same workunit sequence as wjt-15-L1-trc_jt-hl1TR-bb-L1__3__s__62.31_60.32__4_6__6__58.5_1_sixvf_boinc260 14/03/2016 21:33:29 \| LHC@home 1.0 \| Aborting task wjt-15-L1-trc_jt-hl1TR-bb-L1__3__s__62.31_60.32__4_6__6__58.5_1_sixvf_boinc260_3: exceeded disk limit: 651.56MB > 572.20MB
7) Message boards : Number crunching : sorry if this has been already covered...GPU? (Message 27572) Posted 19 Jul 2015 by Richard Haselgrove Post: I process SETI@Home which makes both GPU and CPU Tasks available. The SETI GPU Tasks have been "sliced" into something that runs about 15 CPU Minutes on my GPU while their CPU Tasks run much longer (hours). I believe they have done this because BOINC wasn't able to manage the GPU and do Tasks switching. (Maybe this is still the case.) So, if you got a GPU Tasks that ran hours (Enigma@Home?) it would capture the GPU and hog the processor. Somewhat of a misunderstanding there. Exactly the same SETI tasks are sent to CPUs and GPUs (a few special cases round the fringes) - there is certainly no "differential slicing". If tasks take 15 minutes on GPU, but hours on CPU - then your GPU is faster. That much faster. You can confirm this by checking validated tasks. It's possible to see workunits validated with one tack computed by a GPU, its 'wingmate' computed by a CPU. There's also no problem with task switching with a GPU. The only weakness is that "leave applications in memory while suspended" can't be implemented on GPUs: there is no mechanism for temporarily paging GPU VRAM to hard disk. So GPU applications only make sense if the science applications have a viable checkpointing mechanism which can still be implemented for the parallel-processing GPU app. Provided that's in place - no problem. But task switching does create more of an overhead if the entire app, dataset, and checkpoint has to be reloaded at every switch. I lengthen the task switch interval on machines which might suffer from that - but in general (the exception being GPUGrid), GPU apps are so fast that they never even reach the default 1 hour TSI.
8) Message boards : Cafe LHC : Build your own Lego LHC (Message 27555) Posted 3 Jul 2015 by Richard Haselgrove Post: Make sure you include all the latest design features (scanned from tomorrow's edition of the UK magazine "New Scientist". Work that one out if you can!)
9) Message boards : News : Project down due to a server issue (Message 27539) Posted 13 Jun 2015 by Richard Haselgrove Post: All seems to be working normally again.
10) Message boards : News : DISK LIMIT EXCEEDED (Message 27465) Posted 19 May 2015 by Richard Haselgrove Post: Sadly, in this case that wouldn't have worked - CMS-dev was the first project to exceed 4 GB file sizes. You wouldn't have found one from any other BOINC project. Sure, you could test BOINC by systematically throwing every possible eventuality in its direction: I suspect they would have run out of time and money before exhausting the list. CMS-dev isn't actually 'in the wild' as yet - not fully, at least. It's a pre-Alpha project (they say), accessible by invitation only. By my estimation, there are between 30 and 50 active Windows users testing it at the moment (the uncertainty is because the 20 anonymous test hosts might, or might not, all be owned by the same person). They are all now being urged, with some strength, to update their BOINC client.
11) Message boards : News : DISK LIMIT EXCEEDED (Message 27463) Posted 19 May 2015 by Richard Haselgrove Post: Until just one of the files (CMS-dev) grew above 4 GB, it was not known that third-party software (the BOINC framework) had a millenium-style bug: too little space allocated to hold an intermediate value, in this case for file sizes. I don't think this one would have been caught by internal testing: it needed to be tested under the final infrastructure.
12) Message boards : News : DISK LIMIT EXCEEDED (Message 27456) Posted 16 May 2015 by Richard Haselgrove Post: To amplify Eric's warning: We believe that this only happens on the Windows platform - users of other operating systems should not experience this problem. The problem is caused by any file larger than 4 GB in a 'slot' directory (the working folder that BOINC uses to hold working files for active tasks). These 'slot' directories can be found within the BOINC data directory structure. Large files in other places, such a project folders, don't cause this error message. The files which led to the discovery of this problem are called 'vm_image.vdi', and files large enough to cause problems have only been seen 'in the wild' on machines running CERN's experimental CMS-dev project - though similar problems might also crop up with ATLAS. But once the file or file exists on your computer, tasks from any/every other BOINC project may fail with this error until the file is cleared. The problem is caused by the BOINC client failing to delete these very large files as it should, and every BOINC client version to date is affected. A corrected version of BOINC is being tested, but so far only as a hotfix to the already experimental BOINC v7.5.0 development line. I'm awaiting news about whether the current BOINC v7.4.xx line will be updated with a fix for this problem (and one or two other problems which came to light while were were tracking it down). In the meantime, I can supply links to the hotfix version if anyone needs them.
13) Message boards : News : News 15th May, 2015 (Message 27452) Posted 15 May 2015 by Richard Haselgrove Post: Since I'm not participating in the Pentathlon, and I have no interest in its outcome, I'll set NNT until it ends to reduce the load on the server.
14) Message boards : Number crunching : workunits made to fail? (Message 27436) Posted 9 May 2015 by Richard Haselgrove Post: Thanks a lot; this seems very wrong. I am trying to get the old test executables removed and hope that will help. Eric. I believe the better procedure is to 'deprecate' the app_version, and deploy the new executables as a completely new app_version. That's a database operation, rather than a file exchange.
15) Message boards : Number crunching : workunits made to fail? (Message 27432) Posted 8 May 2015 by Richard Haselgrove Post: That's right; trying to figure out why you got the "wrong" executable. eric. I've just aborted another two, both with multiple failures for other Windows wingmates. v451.07 is listed as the current Windows test app on the applications page, too.
16) Message boards : Number crunching : workunits made to fail? (Message 27430) Posted 8 May 2015 by Richard Haselgrove Post: My laptop host 9924593 got a batch yesterday evening and errored them all. This isn't the runtime exceeded error: it looks like Linux version 452.02 processed them OK, but Windows version 451.07 failed to create an expected output file.
17) Message boards : Number crunching : workunits made to fail? (Message 27426) Posted 7 May 2015 by Richard Haselgrove Post: Not exactly 'designed to fail', but do note that they all have the application name "sixtracktest" - they are test workunits. As with all testing, nobody knows for certain whether they will work or not - if we knew that, the test would be over! So, it's not certain in advance whether they will fail or not, and it helps the scientists if you run them anyway, to find out.
18) Message boards : Number crunching : Time limit errors???? (Message 27424) Posted 7 May 2015 by Richard Haselgrove Post: OK, it hadn't even started, so I've aborted it - no time wasted (the machine needed a reboot for an AV update anyway).
19) Message boards : Number crunching : Time limit errors???? (Message 27422) Posted 7 May 2015 by Richard Haselgrove Post: I have 'inoculated' task 67569958 by increasing <rsc_fpops_bound> by several orders of magnitude - so you should get at least one result from the first set. On the other hand, it might be quicker to abort it and run a 10K turn task instead, or find the million-turn parameter and turn it down a bit. Up to you.
20) Message boards : Number crunching : Host messing up tons of results (Message 27367) Posted 10 Apr 2015 by Richard Haselgrove Post: You'll find dozens of references to host 9996388 in this thread.

Next 20

LHC@home