[systemd-devel] [PATCH] core: collapse JOB_RELOAD on an inactive unit into JOB_NOP

Mon Oct 27 17:03:28 PDT 2014

[Resending to the list, as it seems recipients were wrong in the first attempt]

The discussion on this died down. I'm bringing this back up as it's IMO
quite a significant problem.

To recap:

The core issue is that if a start job is queued for foo.service,
"systemctl reload foo.service" blocks until the service is started and
then ready. This is wrong: the blocking behavior is unlikely to be
useful for any real use case, but it does cause real deadlock issues and
distros already have to work around it.

The main argument in favor of this misbehavior was that if you issue a
reload, you do it with the intent to use the service, and thus it would
be positive to ensure it is usable after such a command. I think that in
practice this is not true: neither would it be a good idea to write code
that relies on such blocking, nor are people likely to do things that
way in practice (good idea or not). As in, people shouldn't, and likely
won't, write code with semantics like the following:

systemctl start --no-block foo.service
systemctl reload foo.service # let's do the blocking here!
# here we can rely on the service being up

In sane code, if you don't want to change the operation of the service,
you should be able to skip the reload call and things shouldn't break.
Any such sane code does not benefit from the extra blocking.

The blocking is actively harmful because it can cause deadlocks. One
case where this is especially likely is during boot in hook code that
changes the configuration of some service. The hook does not know
whether other components intend to use the service afterward or not.
Thus it should generally ensure that the reload is complete before
returning, and not use no-block. But if the hook is called early in the
boot and blocks, this can prevent the later service that reload is
called on from ever actually starting.

IMO the correct way to view this issue is that "configuration of service
X is guaranteed to say Y" and "service X is up" are orthogonal states.
There are several situations where it makes sense to write code that
deals with the first state only; mixing in waits for the other state to
reach a particular value only delays things unnecessarily at best or
causes deadlocks at worst.