[systemd-devel] reexec can cause freeze
Roman Odaisky
roma at qwertty.com
Fri Nov 20 20:05:55 UTC 2020
Hi All,
Suppose systemd is running as PID 1 as
/usr/lib/systemd/systemd --switched-root --system --deserialize 21
and a reexec command happens. systemd serializes its state and prepares to re-
exec itself, passing the state in an FD. Suppose this fails, perhaps because
systemd is now in /lib, not /usr/lib. systemd now tries to execute /sbin/init,
but this time it passes the old arguments verbatim. The new instance can’t
open FD 21 and freezes.
I see several problems here:
1. That systemd passes old args when reexecing itself as /sbin/init
1a. That the only way to change the args is switch-root, which
(undocumentedly!) sends a lot of SIGTERMs
2. That systemd freezes if unable to deserialize
3. That a freeze is unrecoverable, even a reexec can’t be initiated anymore
I ran into this by attempting to install another distribution onto a server
over SSH. It’s a process that involves some tmpfsing, pivot_rooting and PID 1
reexecing, one with which I’m quite familiar, but apparently systemd doesn’t
like such a use case very much. I get it, rescue systems are a thing, but
sometimes they are unavailable and major changes have to be done without
rebooting.
In general, it seems to me that systemd failure handling is based on an
assumption that it’s fresh out of initrd, and a user is physically present
impatiently awaiting the boot process to end. Under these conditions, freezing
with an error message and falling back to exec("/bin/sh") both make sense, but
if a lot of userland is already running and an admin has just issued a daemon-
reexec, then PID 1 turning to frog is not at all welcome.
Please make daemon-reexec more resilient to various unexpected conditions in
cases where system is fully booted up. In particular, I see no reason to
freeze when deserialization FD is not available.
Roman.
More information about the systemd-devel
mailing list