[systemd-devel] Is there a way to find out if Delegate=yes?

Thu Oct 27 10:40:38 UTC 2022

On Thu, Oct 27, 2022 at 11:48:20AM +0300, Yuri Kanivetsky wrote:
> Arseny Maslennikov, for some reason I didn't receive your email.

It had successfully reached this mailing list by 2022-Oct-25, so that
means you're not subscribed to the list. Strangely enough,
the mail receiver rejects emails from non-subscribers, so you wouldn't
be able to reach out to the list at all.

On Thu, Oct 27, 2022 at 11:48:20AM +0300, Yuri Kanivetsky wrote:
> Anyways, indeed on the server with --user:
> 
> $ systemctl --user show -p Delegate run-rcbb44fb2c7774453b18cda8fe03f0f26.scope
> Delegate=yes
> 
> But that's just part of the mystery. Locally, what can I do... I can
> try and query the scope to which my shell belongs to:
> 
> $ systemctl --user show -p Delegate session-2.scope
> Delegate=no

As usual, since the logind machinery which creates this scope
on the request of pam_systemd.so does not set that property.

> Or the enclosing slice for the scope on the server (the local slice
> that matches the one on the server where the transient scope is
> created):
> 
> $ systemctl --user show -p Delegate app.slice
> Delegate=no

It wouldn't make sense the other way for any slice. Slices map to inner
cgroups, which distribute system resources, and not to leaf cgroups,
which can host processes. This means there's no one to delegate to.

The design of slices and scopes (and all the other units processes can
be a part of) largely inherits from the cgroupv2 design in the kernel.
A non-`/` cgroup can _either_ have child cgroups (and distribute
resources to them) _or_ have member processes. Here is an excerpt from
[1]https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst:
>> Non-root cgroups can distribute domain resources to their children
>> only when they don't have any processes of their own.  In other words,
>> only domain cgroups which don't contain any processes can have domain
>> controllers enabled in their "cgroup.subtree_control" files.
That document is a valuable reference in general.

> Somehow I don't need systemd-run for lxc-start and lxc-attach locally.
> Any ideas?

The topic looks really dizzy, in fact.

I'll try to explain what I can. I suppose there's someone in the world
who has really hit the problems described below and is in a better
position to comment, or provide links to available resources where the
experience is documented for the perusal of the community.

(I have little experience with the lxc-* suite)
It looks like lxc-start(1) and lxc-attach(1) try to manage cgroups
themselves. If they work on processes in a systemd-managed cgroup (or
put new processes in one), the unit that maps to that cgroup should
have `Delegate=yes` for at least the following reasons:
— the permissions on file objects under /sys/fs/cgroup/ (e.g.
  controllers) are set appropriately;
— the unit manager puts its hands off the delegated cgroup, so there's a
  single entity managing the cgroup.
This really holds for any container manager foreign to systemd.

If this is not fulfilled, the result is undefined: the lxc-utils and
their payloads may work, not work, occasionally work or (the scariest
option) sometimes break.

In addition to [1], there's also a systemd-centric document on the topic:
[2]https://systemd.io/CGROUP_DELEGATION/
One of the topics it intends to cover is the semantics of the
`Delegate=` property on units.

It is also structured as more of a reference than a guide, but
(unfortunately) often makes a statement on how it should be done, not
explaining why.
>> The single-writer rule: this means that each cgroup only has a single
>> writer, i.e. a single process managing it. It’s OK if different
>> cgroups have different processes managing them. However, only a
>> single process should own a specific cgroup, and when it does that
>> ownership is exclusive, and nothing else should manipulate it at the
>> same time. This rule ensures that various pieces of software don’t
>> step on each other’s toes constantly.
There are no examples of what exactly might go wrong (and if there were
any, your question would be answered).
Also, scopes themselves are a type of unit which is not originated by
systemd (e.g. they don't have a unit file) and only is a tracking
measure for pids, so it's even harder to imagine a scenario where
putting foreign pids in it would break. It's definitely not possible to
make child cgroups of a scope, though.

As noted above, someone who dabbles a lot in the cgroup mechanics and/or
deals with the lxc-* project would be in a better position to comment
than me.

Cheers!

[1]https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst
[2]https://systemd.io/CGROUP_DELEGATION/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20221027/0b5868af/attachment-0001.sig>