[systemd-devel] Deadlocks with reloading jobs which are part of current transaction [was: [PATCH] Avoid reloading services when shutting down]

Wed Feb 4 12:10:02 PST 2015

On Wed, 2015-02-04 at 19:36 +0100, Lennart Poettering wrote:
> On Wed, 04.02.15 20:19, Uoti Urpala (uoti.urpala at pp1.inet.fi) wrote:
> > You're missing an essential point here: there's a distinction between
> > skipping reloads for services which have not not been dispatched, and
> > skipping reloads for services for which startup code is already running
> > (and may be using existing configuration) but which have not reached
> > full "running" status yet.
> > 
> > The former is the correct behavior (but currently handled wrong by
> > systemd!), and never causes races. Only the latter can cause races like
> > described above.
> 
> These two cases aren't that different. If somebody pushes an
> additional job into the queue that wants to run before the reload but
> after the service is up you cannot ot flush out the reload just
> because the service has not started yet...

I cannot parse what you're trying to say here, if it's anything
meaningful. Your "wants to run before the reload" sounds like you're
talking about guaranteeing that a reload NOT happen before something
else runs, but that would be nonsense - the "guarantee" would guarantee
nothing semantically relevant (if systemd only starts executing the
service binary *after* the reload has been queued, it cannot use any
pre-reload-order config at any point; there's no "guaranteed to use old
config" guarantee of any form possible!).

> Whether you change config in your current context, or you do so from a
> new unit's context is no difference: we cannot move anything that is
> supposed to happen after that change before it, and we cannot remove it
> either...

If no code from a service is currently running, it's already guaranteed
that every request issued to the service in the future will use the new
config (no old state exists, and any newly started process will
obviously load the new config). Thus the requirements for a reload are
already fulfilled; the operation is complete, and there is nothing more
to do. Unnecessary waiting only causes deadlocks for no benefit
whatsoever.

> There are some forms of coalescing possible, but we already do all of
> the ones that are safe...

This is not exactly "coalescing" - it's just immediately returning
success if there is no service code running (either in "running" state
or in startup state where a process already exists and could have read
the old config before it was changed).

Removing the current incorrect blocking and returning success
immediately is 100% safe, in the following strictly defined sense:
All requests handled by the service after "systemctl reload" has
returned will use a version of config equal or newer than the one that
was in effect when the reload call was started.

If you still want claim that removing the blocking would not be safe,
please try to construct a sequence of operations where such non-blocking
behavior would lead to failure (failure defined as: the service
processes a request using configuration older than what existed when
"reload" was requested). I'm confident that it is impossible to
construct such a counterexample.