[systemd-devel] Deadlocks with reloading jobs which are part of current transaction

Fri May 1 13:35:51 PDT 2015

On Mon, 2015-04-27 at 18:07 +0200, Lennart Poettering wrote:
> On Wed, 04.02.15 23:48, Uoti Urpala (uoti.urpala at pp1.inet.fi) wrote:
> > If you mean something like "systemctl restart --no-block
> > mydaemon-convert-config.service; systemctl reload mydaemon.service", I
> > don't see why you'd ever /expect/ this to work with any reload semantics
> > - isn't this clear user error, and will be racy with current systemd
> > code just as much as the proposed fix? 
> 
> Yupp, this is what I mean. (though I'd actually specify the --no-block
> in the second command too, though this doesn't make much of a
> difference...)

> > And in any case I'd consider the semantics of reload to be "switch
> > to configuration equal or newer than what existed *when the reload
> > was requested*", without any guarantees that changes from operations
> > queued but not finished before calling reload would be taken into
> > account.
> 
> The queue is really a work queue, and the After= and Before= deps
> dictate how the work can be parallelized or needs to be serialized. As
> such if i have 5 jobs enqueued that depend on each other, i need to
> make sure they are executed in the order specified and can operateon
> the results of the previous job.
> 
> I hope this makes sense...

After those clarifications I believe I now understand what kind of
example case you meant, and it does now seem a meaningful case to
consider; however, I still think that you're wrong, as your example case
turns out to work fine and is not actually a counterexample to the kind
of changes I was talking about.

If I understood correctly, you're talking about a case where service B
has "After=A.service", both A and B have queued jobs where the B job is
a reload, and the queued job for A might change the configuration for B
(so the reload needs to happen after that); and you're worried that
immediately returning success for the reload could create a violation of
the "after job A" requirement. Is this reload property of "After"
documented anywhere? The code does seem to apply it to reloads, but
systemd.unit documentation only starts about start/stop. Anyway, when
you consider what actually happens with my suggested change, it turns
out that even these "After" semantics for reload still work.

The situation where my changes would result in different behavior is
when B has a start job queued, but no code for B is running yet, and you
request a reload for B; current code waits for the start of B before the
reload is considered complete, whereas my change makes the reload return
immediate success. This does not actually change the semantics above:
the only difference is when the reload operation is CONSIDERED COMPLETE,
there is NO difference in what operations are actually run or in which
order! [1] Current code merges RELOAD to existing START and returns
success for reload after START has completed, whereas my change returns
success immediately; but both run exactly the same START operation with
the same ordering constraints, which already ensure that it happens
after A.service (START already has the ordering constraints from
"After="; merging the RELOAD to START does not add any additional
ordering that START would not already have had).

[1] So this difference only really matters when something blocks to wait
until the reload completes.