<div dir="auto"><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 29, 2023, 12:54 Lewis Gaul <<a href="mailto:lewis.gaul@gmail.com">lewis.gaul@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi systemd team,<div><br></div><div>I've encountered an issue when running systemd inside a container using cgroups v2, where if a container exec process is created at the wrong moment during early startup then systemd will fail to move all processes into a child cgroup, and therefore fail to enable controllers due to the "no internal processes" rule introduced in cgroups v2. In other words, a systemd container is started and very soon after a process is created via e.g. 'podman exec systemd-ctr cmd', where the exec process is placed in the container's namespaces (although not a child of the container's PID 1). This is not a totally crazy thing to be doing - this was hit when testing a systemd container, using a container exec "probe" to check when the container is ready.</div></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Wouldn't it be better to have the container inform the host via NOTIFY_SOCKET (the Type=notify mechanism)? I believe systemd has had support for sending readiness notifications from init to a container manager for quite a while.</div><div dir="auto"><br></div><div dir="auto">(Alternatively, connect out to the container's systemd or dbus Unix socket and query it directly that way, but NOTIFY_SOCKET would avoid the need to time it correctly.)</div><div dir="auto"><br></div><div dir="auto">Other than that – I'm not a container expert but this does seem like a self-inflicted problem to me. If you spawn processes unknown to systemd, it makes sense that systemd will fail to handle them.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote></div></div></div>