1) Message boards : Theory Application : Compute Error with Native Theory jobs when Native Atlas and Sixtrack jobs work fine, (Message 44528)
Posted 22 Mar 2021 by alee67
Post:
Someone elsewhere gave me an interesting fix for my problem. Add the following to /etc/systemd/system/boinc-client.service:

[services]
. . .
MemoryAccounting=true
IOAccounting=true
BlockIOAccounting=true
CPUAccounting=true


This creates some empty files in some of the /sys/fs/cgroup/*/boinc directories, which appears to ensure that the directories are created, and can't be deleted. My fix was to recreate the directories periodically, if needed, so this seems to work better.
2) Message boards : Theory Application : Compute Error with Native Theory jobs when Native Atlas and Sixtrack jobs work fine, (Message 44120)
Posted 16 Jan 2021 by alee67
Post:
It appears that the 5 minute fixed time delay for starting BOINC doesn't always prevent the problem that I've been having with Native Theory. Also, it appears that I'm not alone in having trouble with /sys/fs/cgroup/blkio/boinc . At least on this work unit, someone else seems to have the same problem: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=151827668 .
3) Message boards : Theory Application : Compute Error with Native Theory jobs when Native Atlas and Sixtrack jobs work fine, (Message 44102)
Posted 14 Jan 2021 by alee67
Post:
These are preemptible Google Cloud instances (that cost about 1/4 as much as regular non-preemptible instances, so I can get a lot more computing power for my credits) that are only running BOINC projects, and I haven't installed anything on top of the Ubuntu 20.04 image that Google provides besides BOINC and CVMFS. I have no idea what needs to finish before /sys/fs/cgroup/blkio/boinc and the files within can be created, which keeps me from using anything besides a fixed time delay.
4) Message boards : Theory Application : Compute Error with Native Theory jobs when Native Atlas and Sixtrack jobs work fine, (Message 44093)
Posted 13 Jan 2021 by alee67
Post:
I have two Google Cloud instances (they were giving me lots of free credits for publishing a couple of somewhat lame Google Assistant apps) running Ubuntu 20.04 set up to run Native Theory and Atlas jobs, including suspend/resume following the instructions here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

The Native Atlas, as well as Sixtrack, jobs run fine, but the Native Theory jobs were all exiting quickly. The ones I've looked at all seem to have the same problem. This is an example: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

What seems to be happening is that when BOINC starts, /sbin/create-boinc-cgroup creates boinc subdirectories, and files within these subdirectories, in several subdirectories of /sys/fs/cgroup , but this fails for /sys/fs/cgroup/blkio/boinc . Delaying the creation of /sys/fs/cgroup/blkio/boinc and the files within it seems to fix the problem. I'd guess that it needs to be delayed until after something else is completed, but I don't know what, or how to do that, so I just used a simple time delay. One way is to insert sleep commands into /sbin/create-boinc-cgroup , but I ended up creating the file /etc/systemd/system/boinc-client.timer :

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Timer]
OnBootSec=5min

[Install]
WantedBy=timers.target


I had found that a 1 minute delay didn't work, and a 2 minute delay doesn't always work, so I just set it to 5 minutes.

Is there a better way to do this?



©2024 CERN