[systemd-devel] Systemd, cgrupsv2, cgrulesengd, and nftables

Andrei Borzenkov arvidjaar at gmail.com
Sat Jun 15 13:49:33 UTC 2024


On 14.06.2024 11:20, Lennart Poettering wrote:
> On Fr, 14.06.24 10:06, Mikhail Morfikov (mmorfikov at gmail.com) wrote:
> 
>>> --
>>> Lennart Poettering, Berlin
>>
>> I don't need any warranty, I need a way to make this work.
> 
> Yeah, but this is the wrong forum to ask for help then. What you are
> doing is strictly against how systemd and cgroup2 is designed. I mean,
> do what you want, but this is not supported, you are on your own.
> 
>> I'm not sure whether I understand the "single-writer rule", so correct me if I'm
>> wrong. I don't want to write pids to systemd services using cgrulesengd. I just
>> want to create my own cgroup tree, for instance
>> /sys/fs/cgroup/morfikownia/ and I
> 
> Yeah, that's not how this works. On systemd systems the top of the
> cgroup tree is managed by systemd. if you want to manage your own
> cgroups, then ask for a delegated subtree, and do your stuff there,
> but don't interfere with the top of tree, you'll step on systemd's
> feet then, and systemd will run over your feet all the time.
> 
>> want to place there all the processes managed by cgrulesengd (via the
>> /etc/cgrules.conf file). So systemd won't be touching anything inside
>> /sys/fs/cgroup/morfikownia/ and cgrulesengd won't be touching anything in the
>> rest of the cgroup tree -- is this "single-writer rule" ?
> 
> Yeah, sorry, that's not how this works.
> 
>>> And you must delegate a subtree to other managers if a
>>> different manager shall also manage cgroups.
>>
>> How can this be done?
> 
> There are so many docs around about this, you read them:
> 
> https://systemd.io/CGROUP_DELEGATION
> 

Which does not really solve the problem. So, once again:

- nftables allow filtering based on cgroupv2 path
- cgroupv2 path is resolved at the time rule is processed. It is 
impossible to configure rule for a future cgroup

So, no mantra about one ring to rule them all is going to help here as 
long as none of the following is possible

- systemd (which puts processes in cgroups) will also add corresponding 
nftables rule that refers to this new transient cgroup

- or-

- systemd allows pre-creation of cgroups and *atomic* placement of 
processes in them

The former is https://github.com/systemd/systemd/issues/7327 which is 
rejected

The latter is not possible

bor at bor-Latitude-E5450:~/src/systemd$ systemd-run --user --scope --unit 
network.scope cat /proc/self/cgroup
Failed to start transient scope unit: Unit network.scope already exists.
bor at bor-Latitude-E5450:~/src/systemd$

The only way currently to move processes in some scope is not atomic and 
has the same race condition as using e.g. cgrulesengd. Just look at 
https://unix.stackexchange.com/questions/594798/how-do-i-run-a-command-in-a-different-already-existing-systemd-scope-or-sessio

$ systemd-run --user --scope --unit="app-sleep" --property=Delegate=yes 
sleep 9999 &
$ disown
$ sleep 8888 &
$ pid=$(jobs -p)
$ busctl --user call org.freedesktop.systemd1 /org/freedesktop/systemd1 
org.freedesktop.systemd1.Manager AttachProcessesToUnit ssau 
"app-sleep.scope" / 1 "$pid"

Are there ways to do it atomically?


More information about the systemd-devel mailing list