21) Message boards : Theory Application : Cannot run native theory after debian upgrade (Message 44079)
Posted 9 Jan 2021 by Profile zepingouin
Post:
Hello,

HugeTLB is not supported with Debian Buster kernel, currently release is 4.19.0-13-amd64.
I had to get the kernel source package, enable HugeTLB and compile it.
First, install the kernel source package it if not already done : sudo apt install linux-source.
Here are my notes for the process :
# Faire le kernel après la mise à jour du package source
# ADAPTER les NUMEROS de VERSION de noyau selon la version en cours
tar xaf /usr/src/linux-source-4.19.tar.xz -C KERNEL
xzcat /usr/src/linux-config-4.19/config.amd64_none_amd64.xz > /tmp/config.amd64_none_amd64
cd KERNEL/linux-source-4.19
# make ARCH=x86 defconfig (plutôt faire la copie cf. ci-dessous)
#
# Alternatively, you can use the configuration from a Debian-built kernel that
# you already have installed by copying the /boot/config-* file to .config and
# then running make oldconfig to only answer new questions.
# If you do this, ensure that you modify the configuration to set: 
# CONFIG_SYSTEM_TRUSTED_KEYS = ""
cp -p /boot/config-4.19.0-10-amd64 .config
diff -u .config /tmp/config.amd64_none_amd64
make oldconfig
make menuconfig
General setup  -->  Control Group support  --->  [*]   HugeTLB controller
make -j 14 bindeb-pkg # N-2 CPU's et patience ensuite ...
cd ..
sudo dpkg -i linux-headers-4.19.132_4.19.132-1_amd64.deb
sudo dpkg -i linux-image-4.19.132_4.19.132-1_amd64.deb
rm -fR linux-source-4.19
# Recompiler le module vboxdrv pour VirtualBox
sudo /sbin/vboxconfig


Verify that the /etc/systemd/system/multi-user.target.wants/boinc-client.service file has not been modified by the update process and looks like :
[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
ProtectHome=true
Type=simple
Nice=10
User=boinc
PermissionsStartOnly=true
WorkingDirectory=/var/lib/boinc
ExecStartPre=/usr/bin/touch /var/log/boinc.log /var/log/boincerr.log
ExecStartPre=/bin/chown boinc:boinc /var/log/boinc.log /var/log/boincerr.log
ExecStartPre=/bin/sh -c "/bin/chmod +x /sbin/create-boinc-cgroup && /sbin/create-boinc-cgroup"
ExecStart=/bin/sh -c '/usr/bin/boinc --dir /var/lib/boinc-client >/var/log/boinc.log 2>/var/log/boincerr.log'
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target


Unfortunately, I never succeeded to run native Theory tasks with Debian Buster, they always end with errors ...
It was working with Debian Stretch, so something may have changed into the kernel.
22) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42740)
Posted 1 Jun 2020 by Profile zepingouin
Post:
You can either wait until a recent 7.16 BOINC packet will be available or download the recent ca-bundle.crt from the 7.16 branch:
https://github.com/BOINC/boinc/blob/client_release/7/7.16/curl/ca-bundle.crt

Now it works for wget but not for BOINC.
I guess an upgrade for BOINC is also necessary.
23) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42732)
Posted 1 Jun 2020 by Profile zepingouin
Post:
I confirm there is no problem with Ubuntu 18.04 but there is also the same certificate problem with Debian Stretch.
The following command in Debian indicates an expired certificate:
wget -v https://lhcathome.cern.ch/lhcathome

I copied /etc/ssl/certs/ca-certificates.crt (which is the file linked to ca-bundle.crt in /var/lib/boinc) from Ubuntu to Debian with no success.
24) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 41721)
Posted 25 Feb 2020 by Profile zepingouin
Post:
Crystal Pellet, in your list, I have one "ppbar jets 1960 37 - sherpa 2.2.6 default" succeeded.
25) Message boards : Theory Application : New Version v300.05 (Message 41322)
Posted 21 Jan 2020 by Profile zepingouin
Post:
Eventually, task 259592915 ended with success after 54 hours.
26) Message boards : Theory Application : New Version v300.05 (Message 41316)
Posted 20 Jan 2020 by Profile zepingouin
Post:
Looking into runRivet.log, I got these last two lines:
59400 events processed
  Event 59500 ( 1d 7h 44m 17s elapsed / 21h 36m 12s left ) -> ETA: Tue Jan 21 17:43

Seems to progress flawlessly.
27) Message boards : Number crunching : GPU advertised for LHC, but they don't do it? (Message 41141)
Posted 2 Jan 2020 by Profile zepingouin
Post:
Interesting. I tend to buy up 2nd hand cards from back when they had a better ratio. Still, mine is 1:4, so 25%. So I wonder why Milkyway uses double precision? According to what you said, they'd get more calculations out of even my card.

It's all about the needed precision to do the maths, no rounding errors, no overflow, etc ...
A reading to have some clues : Floating Point's not Real
28) Message boards : Number crunching : GPU advertised for LHC, but they don't do it? (Message 41137)
Posted 1 Jan 2020 by Profile zepingouin
Post:
Is it true that you can do a 64bit calculation using two or more 32bit floating point units on the card? What's the performance hit if you did?


Typically reported performance is ~40% of FP32 performance, which is an order of madnitude better than the 1/24 (~4%) FP64:FP32 performance available in almost all consumer-grade GPUs.

Source

HNY 2020 !
29) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40891)
Posted 10 Dec 2019 by Profile zepingouin
Post:
First two tasks downloaded with Boinc 7.10.2 on host 10331191 have 36.87 GFLOPS, which seems a good starting point.

Wait and see for the next tasks, keep fingers crossed :-)
30) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40888)
Posted 10 Dec 2019 by Profile zepingouin
Post:
Preferences are the same and I have not changed settings for a while.
Otherwise, I noticed that Boinc client version is 7.6.33 for the host having multiple GFLOPS and that the two others hosts are using version 7.9.3.
Boinc 7.6.33 is the release for Debian Stretch 9.11.
Boinc 7.9.3 is the release for Ubuntu Bionic Beaver 18.04.3 LTS.
I will give a try for the Stretch backport which is at version 7.10.2 and see if the behaviour is still the same.
31) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40881)
Posted 9 Dec 2019 by Profile zepingouin
Post:
computezrmle wrote:
BOINC uses 2 main factors to calculate estimated runtimes (as well as credits):
- estimated GFLOPS for a task
- peak GFLOPs of a computer

Although it is a BOINC recommendation to estimate the task's GFLOPS as accurate as possible before the server sends it to a client, ATLAS always uses a fixed value.


I have 2 hosts for which it's true but the 3rd has a variable GFLOPS :
Host #1 has 23.08 GFLOPS for every tasks
Host #2 has 55.01 GFLOPS for every tasks
Host #3 has 35.77, 12.06, 9.04, 6.03 and 3.01 GFLOPS

What could be the reason for the third host to have different figures ?
32) Message boards : ATLAS application : Credit V2.0 vs, V1.01 (Message 40799)
Posted 5 Dec 2019 by Profile zepingouin
Post:
I use 3 hosts running ATLAS Simulation v2.73 (native_mt).
It seems the longer the running time, the lower the credit amount.

See :
Host #1 The Good
Host #2 The Bad
Host #3 The Ugly

Although, The Ugly has two ranges of credit, a minor and a major.

Is it some incentive to upgrade for better performance per Watt ? ;-)
33) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40387)
Posted 10 Nov 2019 by Profile zepingouin
Post:
On my Debian GNU/Linux 9.11 (stretch), I have needed to install this missing package : squashfs-tools
34) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39103)
Posted 11 Jun 2019 by Profile zepingouin
Post:
On a never ending TheoryN, I got this in the runRivet.log, "Y out of bounds !" repeatedly :

+----------------------------------+
|                                  |
|      CCC  OOO  M   M I X   X     |
|     C    O   O MM MM I  X X      |
|     C    O   O M M M I   X       |
|     C    O   O M   M I  X X      |
|      CCC  OOO  M   M I X   X     |
|                                  |
+==================================+
|  Color dressed  Matrix Elements  |
|     http://comix.freacafe.de     |
|   please cite  JHEP12(2008)039   |
+----------------------------------+
Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 229284 kB, 11s ).
Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 229680 kB, 0s ).
Initialized the Matrix_Element_Handler for the hard processes.
Initialized the Soft_Photon_Handler.
Hadron_Decay_Map::Read:   Initializing HadronDecays.dat. This may take some time.
Initialized the Hadron_Decay_Handler, Decay model = Hadrons
Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal)
Starting the calculation. Lean back and enjoy ... .
Updating display...
Display update finished (0 histograms, 0 events).
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049
Channel_Elements::GenerateYBackward(2.1181500151722e-10,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-5.10672,^H}):  Y out of bounds !
   ymin, ymax vs. y : -10 10 vs. 10
Setting y to upper bound ymax=10
Updating display...
Display update finished (0 histograms, 0 events).
Channel_Elements::GenerateYForward(0.65305151839935,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,7.31876,^H}):  Y out of bounds !
   ymin, ymax vs. y : -0.21305 0.21305 vs. -0.21305
Setting y to lower bound  ymin=-0.21305
Channel_Elements::GenerateYForward(0.0374227,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-0.581824,^H}):  Y out of bounds !
etc ...
35) Questions and Answers : Getting started : [SOLVED] Unable to agree Terms of Use (Message 37671)
Posted 21 Dec 2018 by Profile zepingouin
Post:
You will find the answer by following this link.

Happy crunching !
36) Questions and Answers : Getting started : Unable to agree Terms of Use (Message 37667)
Posted 20 Dec 2018 by Profile zepingouin
Post:
Finally, I found the workaround ...

You need to check the "Stay logged in" box on login page !
37) Message boards : ATLAS application : native_mt : running time computation question (Message 36540)
Posted 23 Aug 2018 by Profile zepingouin
Post:
Hello,

Looking at this task, it looks like the math is incorrect for the running time compare to the CPU time. Am I wrong ?

task # 205157863
38) Message boards : ATLAS application : pilotErrorDiag": "Payload failed: Interrupt failure code: 1201 (Message 36439)
Posted 15 Aug 2018 by Profile zepingouin
Post:
Hello,

I confirm I get same kind of error after restarting the WU, even with keeping jobs in memory, in this case by switching "on the fly" from 1 thread to 2 threads because having a very long time to complete.

This 1 thread complete in 21 hours : https://lhcathome.cern.ch/lhcathome/result.php?resultid=204126530
This 2 threads complete in 18 hours (after switching from 1 thread) : https://lhcathome.cern.ch/lhcathome/result.php?resultid=204577991

BTW, this 2 threads task is still in running state according to the PandID https://bigpanda.cern.ch/job/4023144304/.
39) Message boards : ATLAS application : One Year native-Linux (SL69 and CentOS) (Message 36177)
Posted 1 Aug 2018 by Profile zepingouin
Post:
Do your tasks really suspend?
I use the same setting but it's only the BOINC client that reports the tasks as suspended.
The scientific app always continues running in the background.

You are right, I monitor with htop the task python -tt etc ... still running.
The big difference is that the iteration number don't restart from zero after a while.
40) Message boards : ATLAS application : One Year native-Linux (SL69 and CentOS) (Message 36162)
Posted 1 Aug 2018 by Profile zepingouin
Post:

No:
- suspend/resume still doesn't work

Hello,

Into the Boinc Manager, I checked the box Leave non-GPU tasks in memory while suspended in the menu Options / Computing preferences... tab Disk and memory and it seems to work.


Previous 20 · Next 20


©2024 CERN