1) Message boards : Number crunching : Atlas Scoring algos are going insane. (Message 48089)
Posted 12 May 2023 by Sagittarius Lupus
Post:
This "creditnew" nonsense, as it's called, has got to go. These numbers are meaningless since a comparison between any two of them is apples-to-oranges by design -- unless you are comparing those from two tasks run on the same exact machine under the same exact conditions.

It is dismaying that the administrators of LHC@Home are in any way comfortable with this state of affairs, as most other BOINC projects have categorically avoided creating such a problem for themselves.

At any rate, as a CPU project, it would be too small a fraction of my overall BOINC credit to make a mathematically significant impact to my numbers anyway, so I can't complain that much. The only value credits have here, for my part, _is_ their usefulness in comparing the relative effectiveness of different hardware, but with the way LHC@Home is configured, the only thing that would make these stats useful is mathematically impossible to do.

When it comes to this project, I just crunch, ignore the statistics, and hope the results are useful. At least we can take comfort in the certain knowledge that the science is real and priceless. And also that it gives me an excuse to exercise and maintain a working Squid/CVMFS/Singularity stack, which is pretty darn cool ngl.
2) Message boards : Theory Application : Theory Native issues with cgroups [SOLVED] (Message 41485)
Posted 6 Feb 2020 by Sagittarius Lupus
Post:
I figured it out. It has everything to do with how you run your BOINC client, and whether you use systemd. Moderators: if you would, please, mark this thread as [SOLVED].

If the BOINC client is installed as a distribution package, and the distribution uses systemd, the distribution may ship the BOINC client with the upstream unit file for running the client as a service. This unit file contains the sandboxing option ProtectControlGroups=true.

This option does what it sounds like it does: it protects control groups from processes started by the service. To prevent modification, it exposes the /sys/fs/cgroup file system tree to these processes as read-only. Thus, BOINC tasks run by a client started as a service configured this way cannot do exactly the thing Cranky -- LHC@Home's wrapper for the runc container process -- is trying to do, and its setup the per-slot control groups is bound to fail.

This is very important for anyone who wants suspend/resume to actually work. You can override this option, e.g., by doing `systemctl edit boinc-client.service` and putting the following chunk into the override file this command creates:

[Service]
ProtectControlGroups=false

This will allow BOINC to modify control groups in parts of the tree where it is permitted to write, assuming the system administrator has elected to mount the cgroups filesystem read-write by redefining it as such -- since the default is that it's probably globally read-only. If that is true, then add this to /etc/fstab (the "rw" option is the important part):

tmpfs  /sys/fs/cgroup  tmpfs  rw,nosuid,nodev,noexec,mode=755  0  0

If I may tag Monsieur Laurence, I would kindly advise adding this small piece of information to the Native Theory Application Setup (Linux only) thread, since the pinned post mentions systemd and implies its use in the recommended setup. Though this will probably only be useful to the most technical of users, if one is trying to run theoretical particle physics simulations on Linux using containers on a systemd machine with an AUFS kernel and the most recent builds of Singularity, one is probably already among the more technical BOINC users in the world.

Now that everything tests successfully, I might finally be able to submit my work to Gentoo in the form of more current ebuilds, and take over maintainership of our CVMFS package in particular. I dearly hope this is useful to someone.
3) Message boards : Theory Application : Theory Native issues with cgroups [SOLVED] (Message 41484)
Posted 5 Feb 2020 by Sagittarius Lupus
Post:
Anyone? Please? I've put an awful lot of work into getting this to function on my distribution, but I can't share the payoff with anyone else if that work is incomplete. I need to be able to get the suspend/resume feature to do its job, and it is obviously not doing that.

If someone might at least offer some pointers for running a job standalone, detached from boinc, so that I may debug the container and its interactions with the host filesystem repeatably, I could try to make some progress on my own.
4) Message boards : Theory Application : Theory Native issues with cgroups [SOLVED] (Message 39817)
Posted 4 Sep 2019 by Sagittarius Lupus
Post:
So... I had the thought to look at cranky.

Cranky is a bash script.

Cranky does a few things, on the host Linux machine, to set up the runtime environment for the container it eventually spins up. Among the functions it executes is one called create_cgroup(), which basically does nothing but run mkdir in a loop of control group hierarchy names. Sounds simple so far.

But... here's what I don't understand. Cranky is running as the boinc user. Cranky is just a child of the task process. Cranky runs in his slot folder, and he does his thing, but he prints out these "Read-only file system" errors from mkdir inside the create_cgroup() function like he's not looking at the same filesystem I am. And if I su to boinc, and if I run that very same function in my terminal while inside a directory named like a slot number, I get a pretty control group hierarchy just where I am supposed to.

This is really, really where I need one of the project developers to say something. Why, for the love of all that is breaking my brain, would cranky's execution environment see a read-only file system when I see one that is perfectly writable? Here. I'll prove it.

boinc@pygoscelis ~ $ mount | grep /sys/
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,noatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=46,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=16318)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)

See? All the /sys/** file systems are mounted with the 'rw' option. Including, and especially, all the control group file systems. There are no read-only file systems here.

I'm missing something. I know I'm missing something, and I think it is somehow particular to BOINC, but I don't know what I'm missing. Please help.
5) Message boards : Theory Application : Theory Native issues with cgroups [SOLVED] (Message 39723)
Posted 24 Aug 2019 by Sagittarius Lupus
Post:
Incidentally, the instructions provided -- in particular, the script we're given to add to the boinc-client.service unit file as an ExecStartPre hook -- don't seem to be the right way to create the control group filesystem hierarchies for the boinc user. The reason is that the unit file, if correctly configured, will run all of its tasks as the user "boinc," which ordinarily will not have permissions to create these locations. Either root must do the work, or the script doing the work on behalf of the boinc user must have the setuid bit set and allow the boinc user to execute it (not a super good idea).

Fortunately, if you are using systemd, there is already a more appropriate way to manage such filesystem objects. The ones we need are, essentially, temporary files in a virtual (tmpfs) filesystem. Systemd's mechanism for managing temporary files is systemd-tmpfiles, configured in /etc/tmpfiles.d. It is responsible for making sure that certain temporary files and directories are present and have the correct modes set at boot time, and for cleaning them up when appropriate. This is what we want to use.

Therefore, I have created a tmpfiles.d drop-in configuration that creates and correctly sets permissions on each of the cgroup hierarchies required for the boinc user, in a way that is functionally isomorphic to the script provided by CERN.

# /etc/tmpfiles.d/boinc.conf
# Type  Path                                                    Mode    User    Group
d       /sys/fs/cgroup/freezer/boinc                            0775    root    boinc
f       /sys/fs/cgroup/freezer/boinc/cgroup.procs               0664    root    boinc
f       /sys/fs/cgroup/freezer/boinc/tasks                      0664    root    boinc
f       /sys/fs/cgroup/freezer/boinc/freezer.state              0664    root    boinc
d       /sys/fs/cgroup/cpuset/boinc                             0775    root    boinc
f       /sys/fs/cgroup/cpuset/boinc/cgroup.procs                0664    root    boinc
f       /sys/fs/cgroup/cpuset/boinc/tasks                       0664    root    boinc
f       /sys/fs/cgroup/cpuset/boinc/cpuset.mems                 0664    root    boinc
f       /sys/fs/cgroup/cpuset/boinc/cpuset.cpus                 0664    root    boinc
d       /sys/fs/cgroup/devices/boinc                            0775    root    boinc
f       /sys/fs/cgroup/devices/boinc/cgroup.procs               0664    root    boinc
f       /sys/fs/cgroup/devices/boinc/tasks                      0664    root    boinc
d       /sys/fs/cgroup/pids/boinc                               0775    root    boinc
f       /sys/fs/cgroup/pids/boinc/cgroup.procs                  0664    root    boinc
f       /sys/fs/cgroup/pids/boinc/tasks                         0664    root    boinc
d       /sys/fs/cgroup/hugetlb/boinc                            0775    root    boinc
f       /sys/fs/cgroup/hugetlb/boinc/cgroup.procs               0664    root    boinc
f       /sys/fs/cgroup/hugetlb/boinc/tasks                      0664    root    boinc
d       /sys/fs/cgroup/cpu,cpuacct/boinc                        0775    root    boinc
f       /sys/fs/cgroup/cpu,cpuacct/boinc/cgroup.procs           0664    root    boinc
f       /sys/fs/cgroup/cpu,cpuacct/boinc/tasks                  0664    root    boinc
d       /sys/fs/cgroup/perf_event/boinc                         0775    root    boinc
f       /sys/fs/cgroup/perf_event/boinc/cgroup.procs            0664    root    boinc
f       /sys/fs/cgroup/perf_event/boinc/tasks                   0664    root    boinc
d       /sys/fs/cgroup/net_cls,net_prio/boinc                   0775    root    boinc
f       /sys/fs/cgroup/net_cls,net_prio/boinc/cgroup.procs      0664    root    boinc
f       /sys/fs/cgroup/net_cls,net_prio/boinc/tasks             0664    root    boinc
d       /sys/fs/cgroup/blkio/boinc                              0775    root    boinc
f       /sys/fs/cgroup/blkio/boinc/cgroup.procs                 0664    root    boinc
f       /sys/fs/cgroup/blkio/boinc/tasks                        0664    root    boinc
d       /sys/fs/cgroup/memory/boinc                             0775    root    boinc
f       /sys/fs/cgroup/memory/boinc/cgroup.procs                0664    root    boinc
f       /sys/fs/cgroup/memory/boinc/tasks                       0664    root    boinc


Please let this be my small contribution toward making other volunteers' lives slightly easier in this regard, for the permissions issues implicit in doing it the other way may be somewhat frustrating to the novice.
6) Message boards : Theory Application : Theory Native issues with cgroups [SOLVED] (Message 39681)
Posted 22 Aug 2019 by Sagittarius Lupus
Post:
I've gone to some great lengths to get Theory Native running on Gentoo Linux, including rewriting the abandoned CVMFS ebuild with the eventual goal of taking over proxy maintainership and making the bits available to other Gentoo users. That all seems to be working. I've completed and validated several TheoryN tasks now, but one problem is still troubling me.

Control groups, and suspend/resume support.

I followed the instructions to create the cgroup hierarchies for the boinc user, in particular the http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup script. There was an issue, initially: on my distribution with systemd, and probably others, the cgroup root tmpfs filesystem at /sys/fs/cgroup is mounted read-only at boot time, so the script bails. Easily handled by remounting the filesystem read-write.

Every TheoryN task I run, however, looks like this:

21:14:47 (14185): wrapper (7.15.26016): starting
21:14:47 (14185): wrapper (7.15.26016): starting
21:14:47 (14185): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 ()
21:14:47 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Detected TheoryN App
21:14:47 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Checking CVMFS.
21:14:47 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Checking runc.
21:14:52 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Creating the filesystem.
21:14:52 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
21:14:52 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Creating cgroup for slot 18
mkdir: cannot create directory ‘/sys/fs/cgroup/freezer/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/cpuset/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/devices/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/cpu,cpuacct/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/pids/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/blkio/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/hugetlb/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/net_cls/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/net_prio/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/perf_event/boinc/18’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/freezer/boinc/18’: Read-only file system
21:14:52 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Updating config.json.
21:14:52 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Running Container 'runc'.
21:14:55 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] ===> [runRivet] Thu Aug 22 01:14:54 UTC 2019 [boinc pp jets 7000 80,-,1760 - pythia8 8.235 default 100000 90]
21:18:41 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Pausing container TheoryN_2279-786411-90_0.
no such directory for freezer.state
21:18:47 EDT -04:00 2019-08-21: cranky-0.0.29: [INFO] Resuming container TheoryN_2279-786411-90_0.
container not paused


Now... this is strange. The errors from mkdir don't make sense: that's not a read-only file system. I already remounted it read-write. I can create those slot cgroup hierarchies myself, as the boinc user. I've spent a few hours trying to find out what could cause such an error, but to no avail. I have no explanation for why the task cannot create these hierarchies on its own.

The job tends to succeed anyway, but that's not really all I'm after; I want to make all of the available features work reliably, including suspend/resume. Does anyone have any ideas? Control groups are fiddly... and upstream wants us using cgroups-v2 already, which use a unified filesystem instead of these hierarchical paths, so I have a bad feeling that I'm headed down a rabbit hole.

Incidentally, I don't want to keep burning through TheoryN tasks while trying to troubleshoot and potentially failing them. Is there a way I can take one offline, outside of BOINC, and retry it a bunch of times without reporting it back to the job server until I get this nailed down?



©2024 CERN