[systemd-devel] Feedback sought: can we drop cgroupv1 support soon?
Lewis Gaul
lewis.gaul at gmail.com
Wed Jul 19 09:23:45 UTC 2023
Hi Lennart, all,
TL;DR: A container making use of cgroup controllers must use the same
cgroup version as the host, and in the case of it being a systemd container
on an arbitrary host then a lack of cgroup v1 support from systemd would
place a cgroup v2 requirement on the host, which is an undesirable property
of a container.
I can totally understand the desire to simplify the codebase/support
matrix, and appreciate this response is coming quite late (almost a year
since cgroups v1 was noted as a future deprecation in systemd). However, I
wanted to share a use-case/argument for keeping cgroups v1 support a little
longer in case it may impact the decision at all.
At my $work we provide a container image to customers, where the container
runs using systemd as the init system. The end-user has some freedom on
how/where to run this container, e.g. using docker/podman on a host of
their choice, or in Kubernetes (e.g. EKS in AWS).
Of course there are bounds on what we officially support, but generally we
would like to support recent LTS releases of major distros, currently
including Ubuntu 20.04, Ubuntu 22.04, RHEL 8, RHEL 9, Amazon Linux 2 (EKS
doesn’t yet support Amazon Linux 2023). Of these, only Ubuntu 22.04 and
RHEL 9 have switched to using cgroups v2 by default, and we are not in a
position to require the end-user to reconfigure their host to enable
running our container. What’s more, since we make use of cgroup controllers
inside the container, we cannot have cgroup v1 controllers enabled on the
host while attempting to use cgroups v2 inside the container.
> Because of that I see no reason why old systemd cgroupv1 payloads
> shouldn#t just work on cgroupv2 hosts: as long as you give them a
> pre-set-up cgroupv1 environemnt, and nothing stops you from doing
> that. In fact, this is something we even documented somewhere: what to
> do if the host only does a subset of the cgroup stuff you want, and
> what you have to do to set up the other stuff (i.e. if host doesn't
> manage your hierarchy of choice, but only others, just follow the same
> structure in the other hierarchy, and clean up after yourself). This
> is what nspawn does: if host is cgroupv2 only it will set up
> name=systemd hierarchy in cgroupv1 itself, and pass that to the
> container.
I don't think this works for us since we need the full cgroup
(v1/v2) filesystem available in the container, with controllers enabled.
This means that we must, for now, continue to support cgroups v1 in our
container image. If systemd were to drop support for cgroups v1 then we may
find ourselves in an awkward position of not being able to upgrade to this
new systemd version, or be forced to pass this restriction on to end-users.
The reason we’re uncomfortable about insisting on the use of cgroups v2 is
that as a container app we ideally wouldn’t place such requirements on the
host.
So, while it's true that the container ecosystem does now largely support
cgroups v2, there is still an aspect of caring about what the host is
running, which from our perspective this should be assumed to be the
default configuration for the chosen distro. With this in mind, we’d
ideally like to have systemd support cgroups v1 a little longer than the
end of this year.
Does this make sense as a use-case and motivation for wanting new systemd
versions to continue supporting cgroups v1? Of course not forever, but
until there are less hosts out there using cgroups v1.
Best wishes,
Lewis
On Fri, 22 Jul 2022 at 11:15, Lennart Poettering <mzerqung at 0pointer.de>
wrote:
> On Do, 21.07.22 16:24, Stéphane Graber (stgraber at ubuntu.com) wrote:
>
> > Hey there,
> >
> > I believe Christian may have relayed some of this already but on my
> > side, as much as I can sympathize with the annoyance of having to
> > support both cgroup1 and cgroup2 side by side, I feel that we're sadly
> > nowhere near the cut off point.
> >
> > >From what I can gather from various stats we have, over 90% of LXD
> > users are still on distributions relying on CGroup1.
> > That's because most of them are using LTS releases of server
> > distributions and those only somewhat recently made the jump to
> > cgroup2:
> > - RHEL 9 in May 2022
> > - Ubuntu 22.04 LTS in April 2022
> > - Debian 11 in August 2021
> >
> > OpenSUSE is still on cgroup1 by default in 15.4 for some reason.
> > All this is also excluding our two largest users, Chromebooks and QNAP
> > NASes, neither of them made the switch yet.
>
> At some point I feel no sympathy there. If google/qnap/suse still are
> stuck in cgroupv1 land, then that's on them, we shouldn't allow
> ourselves to be held hostage by that.
>
> I mean, that Google isn't forward looking in these things is well
> known, but I am a bit surprised SUSE is still so far back.
>
> > I honestly wouldn't be holding deprecating cgroup1 on waiting for
> > those few to wake up and transition.
> > Both ChromeOS and QNAP can very quickly roll it out to all their users
> > should they want to.
> > It's a bit trickier for OpenSUSE as it's used as the basis for SLES
> > and so those enterprise users are unlikely to see cgroup2 any time
> > soon.
> >
> > Now all of this is a problem because:
> > - Our users are slow to upgrade. It's common for them to skip an
> > entire LTS release and those that upgrade every time will usually wait
> > 6 months to a year prior to upgrading to a new release.
> > - This deprecation would prevent users of anything but the most
> > recent release from running any newer containers. As it's common to
> > switch to newer containers before upgrading the host, this would cause
> > some issues.
> > - Unfortunately the reverse is a problem too. RHEL 7 and derivatives
> > are still very common as a container workload, as is Ubuntu 16.04 LTS.
> > Unfortunately those releases ship with a systemd version that does not
> > boot under cgroup2.
>
> Hmm, cgroupv1 named hiearchies should still be available even on
> cgroupv2 hosts. I am pretty sure nspawn at least should have no
> problem with running old cgroupv1 payloads on a cgroupv2 host.
>
> Isn't this issue just an artifact of the fact that LXD doesn't
> pre-mount cgroupfs? Or does it do so these days? because systemd's
> PID1 since time began would just use the cgroup setup it finds itself
> in if it's already mounted/set up. And only mount and make a choice
> between cgroup1 or cgroupv2 if there's really nothing set up so far.
>
> Because of that I see no reason why old systemd cgroupv1 payloads
> shouldn#t just work on cgroupv2 hosts: as long as you give them a
> pre-set-up cgroupv1 environemnt, and nothing stops you from doing
> that. In fact, this is something we even documented somewhere: what to
> do if the host only does a subset of the cgroup stuff you want, and
> what you have to do to set up the other stuff (i.e. if host doesn't
> manage your hierarchy of choice, but only others, just follow the same
> structure in the other hierarchy, and clean up after yourself). This
> is what nspawn does: if host is cgroupv2 only it will set up
> name=systemd hierarchy in cgroupv1 itself, and pass that to the
> container.
>
> (I mean, we might have regressed on this, since i guess this kind of
> setup is not as well tested with nspawn, but I distinctly remember
> that I wrote that stuff once upon a time, and it worked fine then.)
>
> > That last issue has been biting us a bit recently but it's something
> > that one can currently workaround by forcing systemd back into hybrid
> > mode on the host.
>
> This should not be necessary, if LXD would do minimal cgroup setup on
> its own.
>
> > With the deprecation of cgroup1, this won't be possible anymore. You
> > simply won't be able to have both CentOS7 and Fedora XYZ running in
> > containers on the same system as one will only work on cgroup1 and the
> > other only on cgroup2.
>
> I am pretty sure this works fine with nspawn...
>
> > I guess that would mean holding on to cgroup1 support until EOY 2023
> > or thereabout?
>
> That does sound OK to me. We can mark it deprecated before though,
> i.e. generate warnings, and remove it from docs, as long as the actual
> code stays around until then.
>
> Thank you, for the input,
>
> Lennart
>
> --
> Lennart Poettering, Berlin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20230719/a92aff23/attachment.htm>
More information about the systemd-devel
mailing list