[systemd-devel] Deadlocks with reloading jobs which are part of current transaction [was: [PATCH] Avoid reloading services when shutting down]

Wed Feb 4 10:36:09 PST 2015

On Wed, 04.02.15 20:19, Uoti Urpala (uoti.urpala at pp1.inet.fi) wrote:

> On Wed, 2015-02-04 at 16:38 +0100, Lennart Poettering wrote:
> > On Wed, 04.02.15 15:25, Martin Pitt (martin.pitt at ubuntu.com) wrote:
> > > Lennart Poettering [2015-02-04 13:27 +0100]:
> > > > On Wed, 04.02.15 08:56, Martin Pitt (martin.pitt at ubuntu.com) wrote:
> > > > >  - Don't enqueue a reload if the service to be reloaded isn't running.
> > > > >    E. g. postfix.service "inactive/dead" in
> > > > >    https://bugs.debian.org/635777 or smbd.service "start/waiting" in
> > > > >    https://launchpad.net/bugs/1417010.  This would completely avoid
> > > > >    the deadlock in most situations already, and doesn't change the
> > > > >    semantics for working use cases either, so this should even be
> > > > >    applicable for upstream?
> > > > 
> > > > No, this would open up the door for races. The guarantee we give
> > > > around blocking operations, is that by the time they return the
> > > > operation or an equivalent has been executed. More specifically, if
> > > > you start a service, and it is in "starting", and then issue a
> > > > "reload" or "restart", and it returns you *know* that the
> > > > configuration that was on disk at the time you issued the "reload" or
> > > > "restart" -- or a newer one -- is in place. If you'd suppress the
> > > > reload/restart in this case, then you will not get that guarantee,
> > > > because the configuration ultimately loaded might be the one from the
> > > > time the daemon was first put into starting mode at.
> 
> You're missing an essential point here: there's a distinction between
> skipping reloads for services which have not not been dispatched, and
> skipping reloads for services for which startup code is already running
> (and may be using existing configuration) but which have not reached
> full "running" status yet.
> 
> The former is the correct behavior (but currently handled wrong by
> systemd!), and never causes races. Only the latter can cause races like
> described above.

These two cases aren't that different. If somebody pushes an
additional job into the queue that wants to run before the reload but
after the service is up you cannot ot flush out the reload just
because the service has not started yet... 

Whether you change config in your current context, or you do so from a
new unit's context is no difference: we cannot move anything that is
supposed to happen after that change before it, and we cannot remove it
either...

There are some forms of coalescing possible, but we already do all of
the ones that are safe...

> Fixing the systemd semantics should fix most of the bootup deadlock
> cases. This is not a "sysv workaround" or anything like that. The
> current systemd semantics are wrong and undesirable for new code,
> regardless of any legacy compatibility issues. Fixing them would give
> semantics that are more logically correct and work better in
> practice.

No, totally not. THe current semantics give the necessary guarantees
that changing a config file from any context you like or queing a file
config change from any config you like, and then queuing a reload will
take effect, regardless if there's a job for the unit already queued,
running or anything else.

Lennart

-- 
Lennart Poettering, Red Hat