[systemd-devel] reexec can cause freeze

Mon Nov 23 09:15:25 UTC 2020

On Fr, 20.11.20 22:05, Roman Odaisky (roma at qwertty.com) wrote:

> Hi All,
>
> Suppose systemd is running as PID 1 as
>
> /usr/lib/systemd/systemd --switched-root --system --deserialize 21
>
> and a reexec command happens. systemd serializes its state and prepares to re-
> exec itself, passing the state in an FD. Suppose this fails, perhaps because
> systemd is now in /lib, not /usr/lib. systemd now tries to execute /sbin/init,
> but this time it passes the old arguments verbatim. The new instance can’t
> open FD 21 and freezes.

I am sorry? What kind of a weird construction is this? Why would
systemd move from /lib to /usr/lib? And --deserialize 21 is an option
that has been supported since very short after the inception of the
project, it has always been supported, why would it not be able to
open fd 21?

> I see several problems here:
> 1. That systemd passes old args when reexecing itself as /sbin/init

It doesn't do that. It builds a new cmdline on "systemctl
daemon-reexec" and uses it to pass along some info, for example to
which fd its state was serialized.

> 1a. That the only way to change the args is switch-root, which
> (undocumentedly!) sends a lot of SIGTERMs

Why woud you change the args?

"systemctl switch-root" is for switching root, i.e the initrd to host
transition. It sends SIGTERM to all remaining processes, except for
those marked with argv[0][0] = "@". This is documented in all detail
here: https://systemd.io/ROOT_STORAGE_DAEMONS

> 2. That systemd freezes if unable to deserialize

Yes it cannot restore its state, so it freezes with a log message.

> 3. That a freeze is unrecoverable, even a reexec can’t be initiated
> anymore

Yes. But this shouldn't happen.

> I ran into this by attempting to install another distribution onto a server
> over SSH. It’s a process that involves some tmpfsing, pivot_rooting and PID 1
> reexecing, one with which I’m quite familiar, but apparently systemd doesn’t
> like such a use case very much. I get it, rescue systems are a thing, but
> sometimes they are unavailable and major changes have to be done without
> rebooting.

I think you are misunderstanding what "systemctl daemon-reexec" and
"systemctl switch-root" are doing. They serialize state,
pivot_root, reexec and then *deserialize state* again, restoring how
things were before.

You appear to be looking for some model where the state is not
retained? We have no concept for that.

> In general, it seems to me that systemd failure handling is based on an
> assumption that it’s fresh out of initrd, and a user is physically present
> impatiently awaiting the boot process to end. Under these conditions, freezing
> with an error message and falling back to exec("/bin/sh") both make sense, but
> if a lot of userland is already running and an admin has just issued a daemon-
> reexec, then PID 1 turning to frog is not at all welcome.

We simply do not support this. systemd cannot adopt foreign userland,
as it requires process to be sorted into neatly organized
cgroups. Either systemd is your service/process manager, or it
isn't. It can only adopt services/processes from earlier systemd
instance (such as in the initrd), but not totaly foreign ones.

> Please make daemon-reexec more resilient to various unexpected conditions in
> cases where system is fully booted up. In particular, I see no reason to
> freeze when deserialization FD is not available.

I am not grokking the usecase. Please elaborate what you are trying to
do.

"systemctl daemon-reexec"'s usecase is pretty simple so far: update
systemd from one version to the next, and thus make sure to run the
new version of the code.

"systemctl switch-root"'s usecase is pretty simple too: transition
from the initrd systemd to the host systemd.

But what are you trying to do?

Lennart

--
Lennart Poettering, Berlin