Thread 'Theory in containers'

Author	Message
Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52898 - Posted: 28 Jan 2026, 7:49:30 UTC Last modified: 28 Jan 2026, 8:40:52 UTC A new version of the Theory app which runs in containers in now available as a beta. In order to run this you will need BOINC client v8.2 or newer. Podman should also be available on your system. The documentation for this can be found on the BOINC wiki. ID: 52898 · Reply Quote

Schizm Send message Joined: 30 Sep 21 Posts: 2 Credit: 9,137,398 RAC: 2	Message 52900 - Posted: 28 Jan 2026, 9:26:17 UTC Hi, I have received a bunch of these workunits and they all seem to fail within seconds. Normally i get native ATLAS workunits and that seems to work fine; eg CVMFS and podman seem to be working as intended. Is there a setting i have to change on my end to not have these workunits fail? The gist of the errorlogs: time="2026-01-28T09:59:58+01:00" level=warning msg="The cgroupv2 manager is set to systemd but there is no systemd user session available" time="2026-01-28T09:59:58+01:00" level=warning msg="For using systemd, you may need to login using an user session" time="2026-01-28T09:59:58+01:00" level=warning msg="Alternatively, you can enable lingering with: `loginctl enable-linger 126` (possibly as root)" time="2026-01-28T09:59:58+01:00" level=warning msg="Falling back to --cgroup-manager=cgroupfs" time="2026-01-28T09:59:58+01:00" level=warning msg="The cgroupv2 manager is set to systemd but there is no systemd user session available" time="2026-01-28T09:59:58+01:00" level=warning msg="For using systemd, you may need to login using an user session" time="2026-01-28T09:59:58+01:00" level=warning msg="Alternatively, you can enable lingering with: `loginctl enable-linger 126` (possibly as root)" time="2026-01-28T09:59:58+01:00" level=warning msg="Falling back to --cgroup-manager=cgroupfs" For now i have disabled getting tasks to not flood you with broken workunits. Kind regards ID: 52900 · Reply Quote

Toggleton Send message Joined: 4 Mar 17 Posts: 46 Credit: 13,036,976 RAC: 2,843	Message 52901 - Posted: 28 Jan 2026, 9:34:22 UTC On my device(Arch linux with docker) do they run longer but they make the finish steps and boinc count them as computation failure ******* Total number of errors, excluding junctions = 0 ********* ***** Total number of errors, including junctions = 0 ********* ***** Total number of warnings = 0 ********* ***** Fraction of events that fail fragmentation cuts = 0.00000 ******* Generator run finished successfully INFO: rivet analysis finished: numEvents=100000 crossSection=23.5899 --- the last line of the log data: REF_ALEPH_2004_I636645_d91-x01-y01.dat -> /scratch/dat/ALEPH_2004_I636645-ee-189/zhad-C-aleph1-d91-x01-y01/ALEPH_2004_I636645.dat https://lhcathome.cern.ch/lhcathome/result.php?resultid=432008470 https://lhcathome.cern.ch/lhcathome/result.php?resultid=432008470 ID: 52901 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52902 - Posted: 28 Jan 2026, 9:49:25 UTC - in response to Message 52900. For Linux, you might have to enable linger for the boinc user. sudo usermod --add-subuids 100000-165535 --add-subgids 100000-165535 boinc sudo loginctl enable-linger boinc cat /var/lib/boinc/.config/containers/containers.conf [engine] cgroup_manager = "cgroupfs" ID: 52902 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1559 Credit: 10,102,357 RAC: 729	Message 52903 - Posted: 28 Jan 2026, 9:55:35 UTC - in response to Message 52900. In reply to Schizm's message of 28 Jan 2026: Hi, I have received a bunch of these workunits and they all seem to fail within seconds. Normally i get native ATLAS workunits and that seems to work fine; eg CVMFS and podman seem to be working as intended. Is there a setting i have to change on my end to not have these workunits fail? I'm a layman on Linux and containers, but I think you have to set in containers.conf "cgroup-manager=cgroupfs" ID: 52903 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52904 - Posted: 28 Jan 2026, 10:19:33 UTC - in response to Message 52901. I can't see any output from the job. This is strange since even if the job failed, we should still see some output from the client or docker wrapper. ID: 52904 · Reply Quote

Toggleton Send message Joined: 4 Mar 17 Posts: 46 Credit: 13,036,976 RAC: 2,843	Message 52905 - Posted: 28 Jan 2026, 10:35:58 UTC - in response to Message 52904. Here are some logs of this task https://lhcathome.cern.ch/lhcathome/result.php?resultid=431994641 https://pastebin.com/jGVBsSBT stderr.txt https://pastebin.com/m7YVeS1G runRivet.log Once the current tasks have finished i will try to switch to podman if that works better than docker. ID: 52905 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1559 Credit: 10,102,357 RAC: 729	Message 52906 - Posted: 28 Jan 2026, 12:27:24 UTC @Laurence: Why are you running every 10 seconds these 2 commands: ps --all -f and stats --no-stream --format "{{.CPUPerc}} {{.MemUsage}}" Do you consider to reduce this? ID: 52906 · Reply Quote

Toggleton Send message Joined: 4 Mar 17 Posts: 46 Credit: 13,036,976 RAC: 2,843	Message 52908 - Posted: 28 Jan 2026, 12:54:00 UTC - in response to Message 52905. In reply to Toggleton's message of 28 Jan 2026: Once the current tasks have finished i will try to switch to podman if that works better than docker Not sure why docker did not work but Podman with linger enabled works fine with multiple tasks finished successfully. ID: 52908 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52910 - Posted: 28 Jan 2026, 13:26:37 UTC - in response to Message 52906. Last modified: 28 Jan 2026, 13:27:42 UTC In reply to Crystal Pellet's message of 28 Jan 2026: @Laurence: Why are you running every 10 seconds these 2 commands: ps --all -f and stats --no-stream --format "{{.CPUPerc}} {{.MemUsage}}" Do you consider to reduce this? Not sure, this in in the upstream code. Feel free to post the question in the issue tracker. ID: 52910 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52911 - Posted: 28 Jan 2026, 13:29:11 UTC - in response to Message 52908. In reply to Toggleton's message of 28 Jan 2026: In reply to Toggleton's message of 28 Jan 2026: Once the current tasks have finished i will try to switch to podman if that works better than docker Not sure why docker did not work but Podman with linger enabled works fine with multiple tasks finished successfully. Great! ID: 52911 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1559 Credit: 10,102,357 RAC: 729	Message 52912 - Posted: 28 Jan 2026, 14:22:10 UTC - in response to Message 52910. In reply to Laurence's message of 28 Jan 2026: In reply to Crystal Pellet's message of 28 Jan 2026: @Laurence: Why are you running every 10 seconds these 2 commands: ps --all -f and stats --no-stream --format "{{.CPUPerc}} {{.MemUsage}}" Do you consider to reduce this? Not sure, this in in the upstream code. Feel free to post the question in the issue tracker. Thanks Laurence. It suppose to be a check every 10 seconds, whether the job has exited. ID: 52912 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 274 Credit: 2,137,324 RAC: 165	Message 52913 - Posted: 28 Jan 2026, 14:34:39 UTC Up to now, no problems with my Win11 64bit ID: 52913 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 28	Message 52914 - Posted: 28 Jan 2026, 14:43:57 UTC - in response to Message 52912. In reply to Crystal Pellet's message of 28 Jan 2026: In reply to Laurence's message of 28 Jan 2026: In reply to Crystal Pellet's message of 28 Jan 2026: @Laurence: Why are you running every 10 seconds these 2 commands: ps --all -f and stats --no-stream --format "{{.CPUPerc}} {{.MemUsage}}" Do you consider to reduce this? Not sure, this in in the upstream code. Feel free to post the question in the issue tracker. Thanks Laurence. It suppose to be a check every 10 seconds, whether the job has exited. I think I can turn the logging verbosity down to remove this from the logs. For now it is good to see more details just in case we have any issues. ID: 52914 · Reply Quote

Schizm Send message Joined: 30 Sep 21 Posts: 2 Credit: 9,137,398 RAC: 2	Message 52916 - Posted: 28 Jan 2026, 17:24:56 UTC - in response to Message 52902. Enabled linger locally, this did not change anything. Subuids and subgids were set for both boinc group and local user. Using cgroupfs instead of systemd only seems to hide docker (and podman) from the boinc client (for all projects); as far as i know cgroupv2 was needed for rootless containers. When systemd is used LHC detects podman and world community grid detects docker. Boinc client used on this machine: 8.2.8 podman version 3.4.4 Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1 Running docker's hello-world results in this: Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. How the running processes for podman and docker look: schizm@Enceladus:~$ ps -ef\|grep podman boinc 1337 1243 0 13:15 ? 00:00:00 /usr/bin/podman boinc 2510 1 0 13:15 ? 00:00:00 /usr/bin/slirp4netns --disable-host-loopback --mtu=65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /tmp/podman-run-126/netns/cni-c2ec05af-66c0-e5de-7d6e-27e58d346990 tap0 schizm 2888 2796 0 13:15 ? 00:00:00 /usr/bin/podman schizm 143078 3676 0 16:29 pts/0 00:00:00 grep --color=auto podman schizm@Enceladus:~$ ps -ef\|grep docker root 1631 1 0 13:15 ? 00:00:01 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock schizm 143090 3676 0 16:30 pts/0 00:00:00 grep --color=auto docker Another note to add: There was an issue with virtualbox and podman running simultaneously, could something similar be the case here? Can not test my latest changes for theory since i'm getting atlas tasks again instead, but i might have to run the docker daemon under another uid as a fix. Will try to update here after i get new theory tasks. ID: 52916 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 952 Credit: 784,942,409 RAC: 122,177	Message 52920 - Posted: 28 Jan 2026, 19:19:57 UTC - in response to Message 52916. Last modified: 28 Jan 2026, 21:18:58 UTC I didn't have issues to run poadman and virtualbox at the same time before. just got a few sucessfuls running both. You don't need both docker and podman, I chose podman ID: 52920 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 274 Credit: 2,137,324 RAC: 165	Message 52924 - Posted: 28 Jan 2026, 20:44:12 UTC - in response to Message 52913. In reply to [VENETO] boboviz's message of 28 Jan 2026: Up to now, no problems with my Win11 64bit Mmm, some errors, like 432010091 <message> Funzione non corretta. (0x1) - exit code 1 (0x1)</message> <stderr_txt> docker_wrapper 17 starting docker_wrapper config: workdir: /boinc_slot_dir use GPU: no create args: --cap-add=SYS_ADMIN --device /dev/fuse verbose: 1 Using WSL distro Ubuntu Using podman running docker command: ps --all --filter "name=^boinc__lhcathome.cern.ch_lhcathome__theory_2922-4892047-474_0$" --format "{{.Names}}\|{{.Status}}" program: podman command output: EOM creating container boinc__lhcathome.cern.ch_lhcathome__theory_2922-4892047-474_0 running docker command: images program: podman command output: REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/library/almalinux 9 df3270cc8bc8 10 months ago 217 MB EOM building image running docker command: build "." -t boinc__lhcathome.cern.ch_lhcathome__theory_2922-4892047-474 -f Dockerfile program: podman read_from_pipe() error: timeout build_image() failed: -182 ID: 52924 · Reply Quote

homer__simpsons Send message Joined: 19 Nov 15 Posts: 3 Credit: 1,229,390 RAC: 2,853	Message 52988 - Posted: 6 Feb 2026, 23:50:32 UTC Hello, I have a 21h+ running task for Theory Simulation (docker). I'm wondering if I should cancel it? Some of the tasks I received completed in less than 20 minutes, some took 2-6 hours (displaying at 99.987% or 100% for hours). This (21h+) is displaying at 100%. The task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=432278682 The workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=238897618 ID: 52988 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1559 Credit: 10,102,357 RAC: 729	Message 52989 - Posted: 7 Feb 2026, 8:27:42 UTC - in response to Message 52988. In reply to homer__simpsons's message of 6 Feb 2026: Some of the tasks I received completed in less than 20 minutes, some took 2-6 hours (displaying at 99.987% or 100% for hours). This (21h+) is displaying at 100%. The task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=432278682 The workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=238897618 It is very usual when a Theory task is running for more than 1 day. Up to >10 days (very rare) Your task with the herwig7 generator can last longer than 1 day. Let it run. You may show the progress when using "Show Graphics" from BOINC Manager. Highlight the running task and press the "Show Graphics" button. A local webpage will popup. Press the Logs link and then the running.log. ID: 52989 · Reply Quote

homer__simpsons Send message Joined: 19 Nov 15 Posts: 3 Credit: 1,229,390 RAC: 2,853	Message 53564 - Posted: 8 May 2026, 10:47:22 UTC Last modified: 8 May 2026, 10:47:51 UTC Hello, It is very usual when a Theory task is running for more than 1 day. Up to >10 days (very rare) I have this task https://lhcathome.cern.ch/lhcathome/result.php?resultid=434584415 running for 35d (27d15h of computation time), should I abandon it? I do not want to lose that much progress for the project. You may show the progress when using "Show Graphics" from BOINC Manager. "show graphics" is greyed out to me, maybe because I'm using docker executor? ID: 53564 · Reply Quote