[systemd-devel] Systemd cgroup setup issue in containers

Mantas Mikulėnas grawity at gmail.com
Fri Sep 29 11:01:52 UTC 2023


On Fri, Sep 29, 2023, 12:54 Lewis Gaul <lewis.gaul at gmail.com> wrote:

> Hi systemd team,
>
> I've encountered an issue when running systemd inside a container using
> cgroups v2, where if a container exec process is created at the wrong
> moment during early startup then systemd will fail to move all processes
> into a child cgroup, and therefore fail to enable controllers due to the
> "no internal processes" rule introduced in cgroups v2. In other words, a
> systemd container is started and very soon after a process is created via
> e.g. 'podman exec systemd-ctr cmd', where the exec process is placed in the
> container's namespaces (although not a child of the container's PID 1).
> This is not a totally crazy thing to be doing - this was hit when testing a
> systemd container, using a container exec "probe" to check when the
> container is ready.
>

Wouldn't it be better to have the container inform the host via
NOTIFY_SOCKET (the Type=notify mechanism)? I believe systemd has had
support for sending readiness notifications from init to a container
manager for quite a while.

(Alternatively, connect out to the container's systemd or dbus Unix socket
and query it directly that way, but NOTIFY_SOCKET would avoid the need to
time it correctly.)

Other than that – I'm not a container expert but this does seem like a
self-inflicted problem to me. If you spawn processes unknown to systemd, it
makes sense that systemd will fail to handle them.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20230929/4c6aca7e/attachment-0001.htm>


More information about the systemd-devel mailing list