[systemd-devel] Deadlocks with reloading jobs which are part of current transaction [was: [PATCH] Avoid reloading services when shutting down]

Wed Feb 4 10:19:47 PST 2015

On Wed, 2015-02-04 at 16:38 +0100, Lennart Poettering wrote:
> On Wed, 04.02.15 15:25, Martin Pitt (martin.pitt at ubuntu.com) wrote:
> > Lennart Poettering [2015-02-04 13:27 +0100]:
> > > On Wed, 04.02.15 08:56, Martin Pitt (martin.pitt at ubuntu.com) wrote:
> > > >  - Don't enqueue a reload if the service to be reloaded isn't running.
> > > >    E. g. postfix.service "inactive/dead" in
> > > >    https://bugs.debian.org/635777 or smbd.service "start/waiting" in
> > > >    https://launchpad.net/bugs/1417010.  This would completely avoid
> > > >    the deadlock in most situations already, and doesn't change the
> > > >    semantics for working use cases either, so this should even be
> > > >    applicable for upstream?
> > > 
> > > No, this would open up the door for races. The guarantee we give
> > > around blocking operations, is that by the time they return the
> > > operation or an equivalent has been executed. More specifically, if
> > > you start a service, and it is in "starting", and then issue a
> > > "reload" or "restart", and it returns you *know* that the
> > > configuration that was on disk at the time you issued the "reload" or
> > > "restart" -- or a newer one -- is in place. If you'd suppress the
> > > reload/restart in this case, then you will not get that guarantee,
> > > because the configuration ultimately loaded might be the one from the
> > > time the daemon was first put into starting mode at.

You're missing an essential point here: there's a distinction between
skipping reloads for services which have not not been dispatched, and
skipping reloads for services for which startup code is already running
(and may be using existing configuration) but which have not reached
full "running" status yet.

The former is the correct behavior (but currently handled wrong by
systemd!), and never causes races. Only the latter can cause races like
described above.

Fixing the systemd semantics should fix most of the bootup deadlock
cases. This is not a "sysv workaround" or anything like that. The
current systemd semantics are wrong and undesirable for new code,
regardless of any legacy compatibility issues. Fixing them would give
semantics that are more logically correct and work better in practice.

IIRC the smbd.service case was just a buggy circular service definition
and can not be reasonably fixed by any systemd-side semantics change; if
I remember that correctly, it should not be used as an example in any
discussion of systemd-side changes.

> > Hm, I don't quite understand this. If you reload a service which isn't
> > currently running, systemctl will fail. It's just currently going to
> > enqueue the reload request, and executing the job will fail due to the
> > "not active" check in unit_reload(). The problem with that is just
> > that systemctl doesn't fail immediately with "the unit is not active",
> > but blocks on the queued request. So if you don't have pending jobs,
> > the net result is the same, but with pending jobs you can easily get a
> > deadlock.
> 
> Again, if a "start" job is already queued and in progress of being
> dispatched, we need to queue the reload job, to get the right
> guarantess.
> 
> It's not that hard to see, is it?

You're correct about the "in progress of being dispatched" case, but the
problem case is when systemd incorrectly blocks when no client side code
using old configuration has been actually dispatched yet (only a queued
start inside systemd). The latter systemd misbehavior causes the
deadlocks. The "in progress of being dispatched" case typically does not
cause deadlocks, because the already running startup will normally
finish without blocking on anything, and then the new reload can run, so
there's no lock.

> We must execute the jobs in order. If there's start job in progress,
> or a reload job, and we queue a reload job, then we need to wait for
> the start job or reload job to finish, to begin with the new reload
> job. Otherwise you cannot give the guarantee I pointed out above.

Again, that's literally correct but talking about the wrong case. The
relevant case is the one where there ISN'T a "job in progress", only a
queued one.

This was discussed before, last mail being:
http://lists.freedesktop.org/archives/systemd-devel/2014-October/024612.html
(no replies to that one).