[systemd-devel] Deadlocks with reloading jobs which are part of current transaction [was: [PATCH] Avoid reloading services when shutting down]

Wed Feb 4 04:27:26 PST 2015

On Wed, 04.02.15 08:56, Martin Pitt (martin.pitt at ubuntu.com) wrote:

> Lennart Poettering [2015-02-03 21:40 +0100]:
> > It's really about synchronous waiting on jobs. If you synchronously
> > wait for completion of jobs that are ordered against the job your are
> > part of yourself, then things will deadlock.
> 
> Indeed. The problem is that if you reload e. g. postfix from a DHCP or
> "network up/down" hook, such a script doesn't have the slightest idea
> whether it was run because the network changed at runtime (i. e. udev
> event or the user just selected a new network) or whether it happens
> as part of a systemd transaction (boot/shutdown). In the former case
> you do want to block, in the latter case you mustn't.

I don't see why you'd want to block in either case. Networking is
asynchronous anyway, there's really no point in blocking here. What
does this give you?

Modern code, that talks directly to systemctl, that uses hook scripts,
should always use --no-block for these cases.

That said, non-crap code does not rely on hook scripts like this
anyway, and just subscribes to rtnl on its own.

> > Now, regardless which option you choose it's always a good idea to
> > keep this change as local as possible. Altering the state engine for
> > all operations is the worst solution...
> 
> Well, it's a problem which can happen in a lot of scenarios and isn't
> specific to which kind of service or hook script you have, so what's
> "local" is actually quite hard to define here.
> 
> I agree with Michael that involving a lot of shell commands which we
> then have to copy to lots of places (and find these places at all) is
> also not the best solution. So perhaps we could have some middle
> ground here and make systemctl a bit more clever?

Hmm? I don't follow here at all?

I mean, you must have some code in Debian that forwards old sysv
restart/reload requests to systemctl. That's probably going to be in
/usr/sbin/service and maybe a few other places. Why can't you just add
the --no-block logic there, conditionalized to shutdown/reboot?

>  - Don't enqueue a reload if the service to be reloaded isn't running.
>    E. g. postfix.service "inactive/dead" in
>    https://bugs.debian.org/635777 or smbd.service "start/waiting" in
>    https://launchpad.net/bugs/1417010.  This would completely avoid
>    the deadlock in most situations already, and doesn't change the
>    semantics for working use cases either, so this should even be
>    applicable for upstream?

No, this would open up the door for races. The guarantee we give
around blocking operations, is that by the time they return the
operation or an equivalent has been executed. More specifically, if
you start a service, and it is in "starting", and then issue a
"reload" or "restart", and it returns you *know* that the
configuration that was on disk at the time you issued the "reload" or
"restart" -- or a newer one -- is in place. If you'd suppress the
reload/restart in this case, then you will not get that guarantee,
because the configuration ultimately loaded might be the one from the
time the daemon was first put into starting mode at.

Or in other words: if a script does this:

   # sed -i -e 's/something/somethingelse/' /etc/mydaemon.conf
   # systemctl restart mydaemon.service

Then we *must* give the guarantee that when the second command returns
the change just made to /etc/mydaemon.conf is in place, and not a
previous config loaded. And we must guarantee that in any case,
regardless if the daemon is about to start, already started, reloaded
or anything else...

(Now, this example is about restarts, but reload is pretty much the
same, except that many people use async ExecReload= commands (for
example send a signal), which will break this guarantee, but that's on
them then, we recommend synchronous reload operations, to make this
race-free).

> 
> And/or
> 
>  - systemctl reload/restart could imply --no-block if the service is
>    already enqueued in the current transaction. That would avoid this
>    deadlock situation in more cases.

Opens the same race, as described above.

Also: *why* for heaven's sake? Just add --no-block when you are
running from these contexts, and all is good. I really don't get why
you don't want to do that.

> With that the remaining deadlock case would be trying to reload an
> already running service which isn't affected by the current
> transaction, but we haven't seen that in practice yet.
> 
> If you don't want this upstream, I'd keep it as a patch in Debian. But
> I can't really imagine that this wouldn't happen in Fedora or other
> distros? I mean, things like the ISC DHCP hooks aren't a Debianism,
> and a lot of existing software wasn't written with this "be careful on
> service reloads and guess whether you need --no-block" approach in
> mind, as it has never been a problem with other init systems.

On Fedora, these hook scripts use --no-block correctly, and I am
completely puzzled why you don't want this on Debian!

Please, can you elaborate what your issue with --no-block is?

Lennart

-- 
Lennart Poettering, Red Hat