[systemd-devel] Feedback sought: can we drop cgroupv1 support soon?

Fri Aug 18 10:15:44 UTC 2023

> What's stopping you from mounting a private "named" cgroup v1
> hierarchy to such containers (i.e. no controllers). systemd will then
> use that when taking over and not bother with mounting anything on its
> own, such as a cgroupv2 tree.

We specifically want to be able to make use of cgroup controllers within
the container. One example of this would be to use "MemoryLimit" (cgroupv1)
for a systemd unit (I understand this is deprecated in the latest versions
of systemd, but as far as I can see we wouldn't be able to use the cgroupv2
"MemoryMax" config in this scenario anyway).

> You are doing something half broken and
> outside of the intended model already, I am not sure we need to go the
> extra mile to support this for longer.

I'm slightly surprised and disheartened by this viewpoint. I have paid
close attention to https://systemd.io/CONTAINER_INTERFACE/ and
https://systemd.io/CGROUP_DELEGATION/, and I'd interpreted the statement as
being that running systemd in a container should be fully supported (not
only on cgroupsv2, at least using recent-but-not-latest systemd versions).

In particular, the following:

"Note that it is our intention to make systemd systems work flawlessly and
out-of-the-box in containers. In fact, we are interested to ensure that the
same OS image can be booted on a bare system, in a VM and in a container,
and behave correctly each time. If you notice that some component in
systemd does not work in a container as it should, even though the
container manager implements everything documented above, please contact
us."

"When systemd runs as container payload it will make use of all hierarchies
it has write access to. For legacy mode you need to make at least
/sys/fs/cgroup/systemd/ available, all other hierarchies are optional."

I note that point 6 under "Some Don'ts" does correlate with what you're
saying:
"Think twice before delegating cgroup v1 controllers to less privileged
containers. It’s not safe, you basically allow your containers to freeze
the system with that and worse."
However, in our case we're talking about a privileged container, so this
doesn't really apply.

I think there's a definite use-case here, and unfortunately when systemd
drops support for cgroupsv1 I think this will just mean we'll be unable to
upgrade the container's systemd version until all relevant hosts use
cgroupsv2 by default (probably a couple of years away).

Thanks for your time,
Lewis

On Mon, 7 Aug 2023 at 17:26, Lennart Poettering <lennart at poettering.net>
wrote:

> On Do, 20.07.23 01:59, Dimitri John Ledkov (dimitri.ledkov at canonical.com)
> wrote:
>
> > Some deployments that switch back their modern v2 host to hybrid or v1,
> are
> > the ones that need to run old workloads that contain old systemd. Said
> old
> > systemd only has experimental incomplete v2 support that doesn't work
> with
> > v2-only (the one before current stable magick mount value).
>
> What's stopping you from mounting a private "named" cgroup v1
> hierarchy to such containers (i.e. no controllers). systemd will then
> use that when taking over and not bother with mounting anything on its
> own, such as a cgroupv2 tree.
>
> that should be enough to make old systemd happy.
>
> Lennart
>
> --
> Lennart Poettering, Berlin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20230818/f8619a58/attachment.htm>