Message boards : Theory Application : Native Theory 300.08 configuration issue
Message board moderation

To post messages, you must log in.

AuthorMessage
Petr Malik

Send message
Joined: 13 May 22
Posts: 3
Credit: 171,478
RAC: 1
Message 49800 - Posted: 20 Mar 2024, 23:26:08 UTC

Hello.

This is on CentOS Stream 8. I had Theory Application working in legacy mode.
This was after making sure cgroups v2 are enabled and

using systemctl edit boinc-client.service to set:

ProtectControlGroups=no


I also added this to /etc/fstab:

tmpfs  /sys/fs/cgroup  tmpfs  rw,nosuid,nodev,noexec,mode=755  0  0


Then I tried to make it work in non-legacy mode...

I ran the script from https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6075:

sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathome.cern.ch/lhcathome/download/\$script -O /tmp/\$script && chmod u+x /tmp/\$script && /tmp/\$script && rm /tmp/\$script"


Note that I also upgraded sudo package to version 1.9.15.

My /etc/sudoers.d/50-lhcathome_boinc_theory_native now contains this:

Cmnd_Alias LHCATHOMEBOINC_01 = /usr/bin/cat ^/etc/sudoers.d/50-lhcathome_boinc_theory_native$
Cmnd_Alias LHCATHOMEBOINC_02 = /usr/bin/systemctl ^(freeze|thaw) Theory_[-a-zA-Z0-9_]+\.scope$
Cmnd_Alias LHCATHOMEBOINC_03 = /usr/bin/systemd-run ^--scope --unit=[a-zA-Z0-9_-]+ -p BindsTo=[a-zA-Z0-9_\.@-]+ -p After=[a-zA-Z0-9_\.@-]+ --slice-inherit --uid=[a-zA-Z0-9_-]+ --gid=boinc --same-dir -q -G /[a-zA-Z0-9_\./-]+/(runc|runc\.new|runc\.old) --root state run -b cernvm [a-zA-Z0-9_-]+$

%boinc     ALL = (ALL) NOPASSWD: LHCATHOMEBOINC_01, LHCATHOMEBOINC_02, LHCATHOMEBOINC_03


Note that in I LHCATHOMEBOINC_03 changed -u to --unit=, because -u was giving me an unknown option error on my OS.

I also "activated" the boinc account by running:

usermod -s /bin/bash boinc
passwd boinc


and gave it a password...

Now I keep getting this error from all Theory tasks:

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
19:26:44 (19050): wrapper (7.15.26016): starting
19:26:44 (19050): wrapper (7.15.26016): starting
19:26:44 (19050): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.1.4 ()
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Detected Theory App
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] This application must have permanent access to
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] online repositories via a local CVMFS service.
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] It supports suspend/resume if a couple of
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] requirements are fulfilled.
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Most important:
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] - init process is systemd
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] - cgroups v2 is enabled and 'freezer' is available
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] - the user running this application is a member of the 'boinc' group
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] - sudo is at least version 1.9.10
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] - sudoer file provided by LHC@home is installed
19:26:44 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Checking local requirements.
19:27:09 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.15p5.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Can't find '/etc/cvmfs/domain.d/cern.ch.local'.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Can't find '/etc/cvmfs/config.d/cvmfs-config.cern.ch.local'.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Probing /cvmfs/alice.cern.ch... OK
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Probing /cvmfs/cernvm-prod.cern.ch... OK
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Probing /cvmfs/grid.cern.ch... OK
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Probing /cvmfs/sft.cern.ch... OK
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] 2.11.2.0 http://s1bnl-cvmfs.openhtc.io/cvmfs/alice.cern.ch DIRECT
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Found a local runc version 1.1.12.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Creating container filesystem.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm4
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Starting runc container.
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] To get some details on systemd level run
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] systemctl status Theory_2687-2495123-1160_0.scope
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] mcplots runspec: boinc pp bbbar 7000 - - pythia8 8.305 CP2-CR1 100000 1160
19:29:15 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
sudo: unable to open /run/sudo/ts/971: Read-only file system
sudo: a password is required
19:29:41 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Container Theory_2687-2495123-1160_0 finished with status code 1.
19:29:41 EDT -04:00 2024-03-19: cranky-0.1.4: [INFO] Preparing output.
19:29:41 EDT -04:00 2024-03-19: cranky-0.1.4: [ERROR] No output found.
19:29:41 (19050): cranky exited; CPU time 0.459741
19:29:41 (19050): app exit status: 0xce
19:29:41 (19050): called boinc_finish(195)

</stderr_txt>
]]>


Any ideas?
Thanks.[/b][/code]
ID: 49800 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,586,938
RAC: 122,852
Message 49801 - Posted: 21 Mar 2024, 1:23:50 UTC

You need a local cvmfs installation.
Instructions how to do this, do you find here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594#44232
ID: 49801 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,561,118
RAC: 130,538
Message 49806 - Posted: 21 Mar 2024, 8:23:26 UTC - in response to Message 49800.  
Last modified: 21 Mar 2024, 8:26:06 UTC

Your computers (hence it's logs) are not visible for other volunteers.
Please make them visible in your prefs.


using systemctl edit boinc-client.service to set:

ProtectControlGroups=no

You modified "ProtectControlGroups"?
Why?
The usual suggestion is to replace "ProtectSystem=strict" with "ProtectSystem=full".

Did you set other hardening options?
If so, they may stop BOINC from working.
Start with the settings your Linux vendor ships at installation time.


I also added this to /etc/fstab:

tmpfs  /sys/fs/cgroup  tmpfs  rw,nosuid,nodev,noexec,mode=755  0  0

Why?
Cgroups are kernel internal administrative structures.
If enabled they should automatically be mapped to /sys/fs/cgroup.
There's usually no need to mount them via fstab or force them through tmpfs.



Note that in I LHCATHOMEBOINC_03 changed -u to --unit=, because -u was giving me an unknown option error on my OS.

This looks weird for the following reasons:

1.
The cranky script calls systemd-run with "-u" which MUST match the Cmnd_Alias in the sudoers file.
If there's no match the command will not be recognized by sudo.
But you don't have a match since you did not modify the command within the cranky script, did you?

2.
"--unit" and it's short form "-u" have both been introduced in the same systemd version.
Either both are allowed to be used or none.

3.
Systemd-run called by cranky also uses the "-p" option.
That option has been introduced after the "unit" options.
If "-p" works there's no reason why both "unit" options shouldn't.

Please post the output of "systemd-run --version".
ID: 49806 · Report as offensive     Reply Quote
Petr Malik

Send message
Joined: 13 May 22
Posts: 3
Credit: 171,478
RAC: 1
Message 49808 - Posted: 21 Mar 2024, 18:24:12 UTC - in response to Message 49806.  

Hi and thanks for the response.

I have updated the profiler setting, hopefully the logs are now visible.

I think at some point I was getting some errors regarding read-only filesystem and was trying to resolve it following some suggestions from this board such as: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5121
That's where the ProtectControlGroups and tmpfs comes from.

Here are the outputs from systemd-run --version and systemd-run --help.

$ systemd-run --version

systemd 239 (239-82.el8)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy

$ systemd-run --help
systemd-run [OPTIONS...] {COMMAND} [ARGS...]

Run the specified command in a transient scope or service.

  -h --help                       Show this help
     --version                    Show package version
     --no-ask-password            Do not prompt for password
     --user                       Run as user unit
  -H --host=[USER@]HOST           Operate on remote host
  -M --machine=CONTAINER          Operate on local container
     --scope                      Run this as scope rather than service
     --unit=UNIT                  Run under the specified unit name
  -p --property=NAME=VALUE        Set service or scope unit property
     --description=TEXT           Description for unit
     --slice=SLICE                Run in the specified slice
     --no-block                   Do not wait until operation finished
  -r --remain-after-exit          Leave service around until explicitly stopped
     --wait                       Wait until service stopped again
     --send-sighup                Send SIGHUP when terminating
     --service-type=TYPE          Service type
     --uid=USER                   Run as system user
     --gid=GROUP                  Run as system group
     --nice=NICE                  Nice level
  -E --setenv=NAME=VALUE          Set environment
  -t --pty                        Run service on pseudo TTY as STDIN/STDOUT/
                                  STDERR
  -P --pipe                       Pass STDIN/STDOUT/STDERR directly to service
  -q --quiet                      Suppress information messages during runtime
  -G --collect                    Unload unit after it ran, even when failed

Path options:
     --path-property=NAME=VALUE   Set path unit property

Socket options:
     --socket-property=NAME=VALUE Set socket unit property

Timer options:
     --on-active=SECONDS          Run after SECONDS delay
     --on-boot=SECONDS            Run SECONDS after machine was booted up
     --on-startup=SECONDS         Run SECONDS after systemd activation
     --on-unit-active=SECONDS     Run SECONDS after the last activation
     --on-unit-inactive=SECONDS   Run SECONDS after the last deactivation
     --on-calendar=SPEC           Realtime timer
     --timer-property=NAME=VALUE  Set timer unit property
ID: 49808 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,561,118
RAC: 130,538
Message 49810 - Posted: 21 Mar 2024, 20:08:17 UTC - in response to Message 49808.  

OK, I see where it comes from.
You shouldn't use that any more for (mainly) the following reasons:

1.
The thread explains settings for cgroups v1.
These can't be mixed with cgroups v2.

2.
The recent cranky app uses systemd-run to start it's main part as a systemd scope.
This method delegates suspend/resume to systemd which implicitly uses cgroups v2.
Hence, users don't need to directly fiddle around with cgroups(v1) stuff any more.

Furthermore, cgroups v1 support is already deprecated in systemd and as a result in all Linux distros using it.



As for the systemd version

This is what the original maintainer's manpage states.
See:
https://www.freedesktop.org/software/systemd/man/latest/systemd-run.html
--unit=, -u
    Use this unit name instead of an automatically generated one.
    Added in version 206.

--property=, -p
    Sets a property on the scope or service unit that is created. This option takes an assignment in the same format as systemctl(1)'s set-property command.
    Added in version 211.
    
--slice-inherit
    Make the new .service or .scope unit part of the inherited slice. This option can be combined with --slice=.
    An inherited slice is located within systemd-run slice. Example: if systemd-run slice is foo.slice, and the --slice= argument is bar, the unit will be placed under the foo-bar.slice.
    Added in version 246.

The latter might be a problem since cranky uses "--slice-inherit" and your version reports v239.
You may need to upgrade systemd or use a more recent Linux distro.

Hint:
Systemd v246 has been released in July 2020.
https://lwn.net/Articles/827675/
Hence, more than 3 years before this cranky version.
ID: 49810 · Report as offensive     Reply Quote
Petr Malik

Send message
Joined: 13 May 22
Posts: 3
Credit: 171,478
RAC: 1
Message 49850 - Posted: 27 Mar 2024, 15:40:01 UTC

Solved. I ended up installing a new up-to-date OS (Fedora 39). Then followed this guide to install BOINC on Fedora and also this guide to install CVMFS on Fedora.

I install CVMFS:

dnf install https://ecsft.cern.ch/dist/cvmfs/cvmfs-2.11.0/cvmfs-2.11.0-1.fc34.x86_64.rpm                  https://ecsft.cern.ch/dist/cvmfs/cvmfs-config/cvmfs-config-default-latest.noarch.rpm                  http://ecsft.cern.ch/dist/cvmfs/cvmfs-2.11.0/cvmfs-libs-2.11.0-1.fc34.x86_64.rpm


cvmfs_config setup


Then added CVMFS configuration to /etc/cvmfs/default.local:

CVMFS_REPOSITORIES="atlas,atlas-condb,grid,cernvm-prod,sft,alice"
CVMFS_HTTP_PROXY="auto;DIRECT"
CVMFS_USE_CDN=yes
CVMFS_CLIENT_PROFILE=single


And ran the prepare_theory_native_environment script from this board:

sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathome.cern.ch/lhcathome/download/\$script -O /tmp/\$script && chmod u+x /tmp/\$script && /tmp/\$script && rm /tmp/\$script"
ID: 49850 · Report as offensive     Reply Quote

Message boards : Theory Application : Native Theory 300.08 configuration issue


©2024 CERN