21)
Message boards :
Theory Application :
Cannot run native theory after debian upgrade
(Message 44079)
Posted 9 Jan 2021 by zepingouin Post: Hello, HugeTLB is not supported with Debian Buster kernel, currently release is 4.19.0-13-amd64. I had to get the kernel source package, enable HugeTLB and compile it. First, install the kernel source package it if not already done : sudo apt install linux-source. Here are my notes for the process : # Faire le kernel après la mise à jour du package source # ADAPTER les NUMEROS de VERSION de noyau selon la version en cours tar xaf /usr/src/linux-source-4.19.tar.xz -C KERNEL xzcat /usr/src/linux-config-4.19/config.amd64_none_amd64.xz > /tmp/config.amd64_none_amd64 cd KERNEL/linux-source-4.19 # make ARCH=x86 defconfig (plutôt faire la copie cf. ci-dessous) # # Alternatively, you can use the configuration from a Debian-built kernel that # you already have installed by copying the /boot/config-* file to .config and # then running make oldconfig to only answer new questions. # If you do this, ensure that you modify the configuration to set: # CONFIG_SYSTEM_TRUSTED_KEYS = "" cp -p /boot/config-4.19.0-10-amd64 .config diff -u .config /tmp/config.amd64_none_amd64 make oldconfig make menuconfig General setup --> Control Group support ---> [*] HugeTLB controller make -j 14 bindeb-pkg # N-2 CPU's et patience ensuite ... cd .. sudo dpkg -i linux-headers-4.19.132_4.19.132-1_amd64.deb sudo dpkg -i linux-image-4.19.132_4.19.132-1_amd64.deb rm -fR linux-source-4.19 # Recompiler le module vboxdrv pour VirtualBox sudo /sbin/vboxconfig Verify that the /etc/systemd/system/multi-user.target.wants/boinc-client.service file has not been modified by the update process and looks like : [Unit] Description=Berkeley Open Infrastructure Network Computing Client Documentation=man:boinc(1) After=network-online.target [Service] ProtectHome=true Type=simple Nice=10 User=boinc PermissionsStartOnly=true WorkingDirectory=/var/lib/boinc ExecStartPre=/usr/bin/touch /var/log/boinc.log /var/log/boincerr.log ExecStartPre=/bin/chown boinc:boinc /var/log/boinc.log /var/log/boincerr.log ExecStartPre=/bin/sh -c "/bin/chmod +x /sbin/create-boinc-cgroup && /sbin/create-boinc-cgroup" ExecStart=/bin/sh -c '/usr/bin/boinc --dir /var/lib/boinc-client >/var/log/boinc.log 2>/var/log/boincerr.log' ExecStop=/usr/bin/boinccmd --quit ExecReload=/usr/bin/boinccmd --read_cc_config ExecStopPost=/bin/rm -f lockfile IOSchedulingClass=idle [Install] WantedBy=multi-user.target Unfortunately, I never succeeded to run native Theory tasks with Debian Buster, they always end with errors ... It was working with Debian Stretch, so something may have changed into the kernel. |
22)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42740)
Posted 1 Jun 2020 by zepingouin Post: You can either wait until a recent 7.16 BOINC packet will be available or download the recent ca-bundle.crt from the 7.16 branch: Now it works for wget but not for BOINC. I guess an upgrade for BOINC is also necessary. |
23)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42732)
Posted 1 Jun 2020 by zepingouin Post: I confirm there is no problem with Ubuntu 18.04 but there is also the same certificate problem with Debian Stretch. The following command in Debian indicates an expired certificate: wget -v https://lhcathome.cern.ch/lhcathome I copied /etc/ssl/certs/ca-certificates.crt (which is the file linked to ca-bundle.crt in /var/lib/boinc) from Ubuntu to Debian with no success. |
24)
Message boards :
Theory Application :
(Native) Theory - Sherpa looooooong runners
(Message 41721)
Posted 25 Feb 2020 by zepingouin Post: Crystal Pellet, in your list, I have one "ppbar jets 1960 37 - sherpa 2.2.6 default" succeeded. |
25)
Message boards :
Theory Application :
New Version v300.05
(Message 41322)
Posted 21 Jan 2020 by zepingouin Post: Eventually, task 259592915 ended with success after 54 hours. |
26)
Message boards :
Theory Application :
New Version v300.05
(Message 41316)
Posted 20 Jan 2020 by zepingouin Post: Looking into runRivet.log, I got these last two lines: 59400 events processed Event 59500 ( 1d 7h 44m 17s elapsed / 21h 36m 12s left ) -> ETA: Tue Jan 21 17:43 Seems to progress flawlessly. |
27)
Message boards :
Number crunching :
GPU advertised for LHC, but they don't do it?
(Message 41141)
Posted 2 Jan 2020 by zepingouin Post: Interesting. I tend to buy up 2nd hand cards from back when they had a better ratio. Still, mine is 1:4, so 25%. So I wonder why Milkyway uses double precision? According to what you said, they'd get more calculations out of even my card. It's all about the needed precision to do the maths, no rounding errors, no overflow, etc ... A reading to have some clues : Floating Point's not Real |
28)
Message boards :
Number crunching :
GPU advertised for LHC, but they don't do it?
(Message 41137)
Posted 1 Jan 2020 by zepingouin Post: Is it true that you can do a 64bit calculation using two or more 32bit floating point units on the card? What's the performance hit if you did? Typically reported performance is ~40% of FP32 performance, which is an order of madnitude better than the 1/24 (~4%) FP64:FP32 performance available in almost all consumer-grade GPUs. Source HNY 2020 ! |
29)
Message boards :
ATLAS application :
ATLAS vbox version 2.00
(Message 40891)
Posted 10 Dec 2019 by zepingouin Post: First two tasks downloaded with Boinc 7.10.2 on host 10331191 have 36.87 GFLOPS, which seems a good starting point. Wait and see for the next tasks, keep fingers crossed :-) |
30)
Message boards :
ATLAS application :
ATLAS vbox version 2.00
(Message 40888)
Posted 10 Dec 2019 by zepingouin Post: Preferences are the same and I have not changed settings for a while. Otherwise, I noticed that Boinc client version is 7.6.33 for the host having multiple GFLOPS and that the two others hosts are using version 7.9.3. Boinc 7.6.33 is the release for Debian Stretch 9.11. Boinc 7.9.3 is the release for Ubuntu Bionic Beaver 18.04.3 LTS. I will give a try for the Stretch backport which is at version 7.10.2 and see if the behaviour is still the same. |
31)
Message boards :
ATLAS application :
ATLAS vbox version 2.00
(Message 40881)
Posted 9 Dec 2019 by zepingouin Post: computezrmle wrote: BOINC uses 2 main factors to calculate estimated runtimes (as well as credits): I have 2 hosts for which it's true but the 3rd has a variable GFLOPS : Host #1 has 23.08 GFLOPS for every tasks Host #2 has 55.01 GFLOPS for every tasks Host #3 has 35.77, 12.06, 9.04, 6.03 and 3.01 GFLOPS What could be the reason for the third host to have different figures ? |
32)
Message boards :
ATLAS application :
Credit V2.0 vs, V1.01
(Message 40799)
Posted 5 Dec 2019 by zepingouin Post: I use 3 hosts running ATLAS Simulation v2.73 (native_mt). It seems the longer the running time, the lower the credit amount. See : Host #1 The Good Host #2 The Bad Host #3 The Ugly Although, The Ugly has two ranges of credit, a minor and a major. Is it some incentive to upgrade for better performance per Watt ? ;-) |
33)
Message boards :
ATLAS application :
ATLAS native version 2.73
(Message 40387)
Posted 10 Nov 2019 by zepingouin Post: On my Debian GNU/Linux 9.11 (stretch), I have needed to install this missing package : squashfs-tools |
34)
Message boards :
Theory Application :
New Native Theory Version 1.1
(Message 39103)
Posted 11 Jun 2019 by zepingouin Post: On a never ending TheoryN, I got this in the runRivet.log, "Y out of bounds !" repeatedly : +----------------------------------+ | | | CCC OOO M M I X X | | C O O MM MM I X X | | C O O M M M I X | | C O O M M I X X | | CCC OOO M M I X X | | | +==================================+ | Color dressed Matrix Elements | | http://comix.freacafe.de | | please cite JHEP12(2008)039 | +----------------------------------+ Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 229284 kB, 11s ). Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 229680 kB, 0s ). Initialized the Matrix_Element_Handler for the hard processes. Initialized the Soft_Photon_Handler. Hadron_Decay_Map::Read: Initializing HadronDecays.dat. This may take some time. Initialized the Hadron_Decay_Handler, Decay model = Hadrons Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal) Starting the calculation. Lean back and enjoy ... . Updating display... Display update finished (0 histograms, 0 events). ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049 Channel_Elements::GenerateYBackward(2.1181500151722e-10,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-5.10672,^H}): Y out of bounds ! ymin, ymax vs. y : -10 10 vs. 10 Setting y to upper bound ymax=10 Updating display... Display update finished (0 histograms, 0 events). Channel_Elements::GenerateYForward(0.65305151839935,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,7.31876,^H}): Y out of bounds ! ymin, ymax vs. y : -0.21305 0.21305 vs. -0.21305 Setting y to lower bound ymin=-0.21305 Channel_Elements::GenerateYForward(0.0374227,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-0.581824,^H}): Y out of bounds ! etc ... |
35)
Questions and Answers :
Getting started :
[SOLVED] Unable to agree Terms of Use
(Message 37671)
Posted 21 Dec 2018 by zepingouin Post: You will find the answer by following this link. Happy crunching ! |
36)
Questions and Answers :
Getting started :
Unable to agree Terms of Use
(Message 37667)
Posted 20 Dec 2018 by zepingouin Post: Finally, I found the workaround ... You need to check the "Stay logged in" box on login page ! |
37)
Message boards :
ATLAS application :
native_mt : running time computation question
(Message 36540)
Posted 23 Aug 2018 by zepingouin Post: Hello, Looking at this task, it looks like the math is incorrect for the running time compare to the CPU time. Am I wrong ? task # 205157863 |
38)
Message boards :
ATLAS application :
pilotErrorDiag": "Payload failed: Interrupt failure code: 1201
(Message 36439)
Posted 15 Aug 2018 by zepingouin Post: Hello, I confirm I get same kind of error after restarting the WU, even with keeping jobs in memory, in this case by switching "on the fly" from 1 thread to 2 threads because having a very long time to complete. This 1 thread complete in 21 hours : https://lhcathome.cern.ch/lhcathome/result.php?resultid=204126530 This 2 threads complete in 18 hours (after switching from 1 thread) : https://lhcathome.cern.ch/lhcathome/result.php?resultid=204577991 BTW, this 2 threads task is still in running state according to the PandID https://bigpanda.cern.ch/job/4023144304/. |
39)
Message boards :
ATLAS application :
One Year native-Linux (SL69 and CentOS)
(Message 36177)
Posted 1 Aug 2018 by zepingouin Post: Do your tasks really suspend? You are right, I monitor with htop the task python -tt etc ... still running. The big difference is that the iteration number don't restart from zero after a while. |
40)
Message boards :
ATLAS application :
One Year native-Linux (SL69 and CentOS)
(Message 36162)
Posted 1 Aug 2018 by zepingouin Post:
Hello, Into the Boinc Manager, I checked the box Leave non-GPU tasks in memory while suspended in the menu Options / Computing preferences... tab Disk and memory and it seems to work. |
©2024 CERN