[systemd-devel] [PATCH] core: collapse JOB_RELOAD on an inactive unit into JOB_NOP

Fri Aug 15 12:09:39 PDT 2014

On Fri, 2014-08-15 at 22:22 +0400, Andrei Borzenkov wrote:
> В Fri, 15 Aug 2014 20:25:57 +0300
> Uoti Urpala <uoti.urpala at pp1.inet.fi> пишет:
> > The problem with this is that it's common for things updating
> > configuration to be separate from things using the daemon. If something
> > changes, the configuration update part wants to guarantee that
> > subsequent requests, *if any*, use the new configuration, but does not
> > itself make any such requests; as such, blocking for the service to be
> > up only causes unnecessary delays and sometimes deadlocks. Ensuring that
> > the service is up belongs to different code paths that actually make
> > requests to the daemon. And they do that whether there's been a reload
> > or not, so they need to handle it regardless of reload behavior anyway.
> > 
> 
> It's not how I interpret "reload" and how "reload" was traditionally
> implemented by initscripts. "reload" means - request daemon to do
> whatever is necessary to start using new configuration. It never
> implied changing this configuration. This happens outside of scope of
> performing "reload" action. You seem to interpret "reload" as request to
> update static on-disk configuration of service. Am I right?

No, I didn't say anything about "systemctl reload" itself modifying
on-disk configuration (if that's really what your "request to update
static on-disk-configuration" meant).

The basic difference in desired semantics seems to be:

me: "reload" should ensure that system has switched to the new
configuration. No other semantics, just that any configuration that is
used after "reload" has returned is the new one.

Lennart: "reload" should ensure the system has switched to the new
configuration, *and* should also wait to ensure that the daemon is up
and is currently responding to requests with the new configuration if
possible.

The latter semantics cause problems for any generic state change code
which writes new configuration for a service, runs "systemctl reload",
and then informs the caller that state was successfully changed.
Changing configuration does not imply that you want the daemon to be
ready to handle requests!

In case this is still not clear, consider this division of code:

1) Event hook which runs in response to some external changes or admin
requests. Writes new configuration for the daemon, and then runs
"systemctl reload foo.service". Does not use --no-block, because it
should be guaranteed that the new configuration is in effect before the
hook returns. Does not itself make any requests to the daemon.

2) Code elsewhere that actually makes requests to the daemon.

Code 1 can run early during the boot before the service itself starts.
If "reload" blocks until the queued start of the daemon is executed,
this causes a deadlock: the hook waits for the daemon to start, but boot
can not progress to the point where the daemon starts because the part
running the hook is blocked in systemctl.

Having reload block until a starting service is really be up does not
have any positive effect: code 2 has to depend on other ways ensure that
the daemon is up before making requests anyway, because it can not
assume that the reload hook has necessarily been triggered at any prior
point (and even if it could make such an assumption, relying on that
would seem like quite a hacky design - there are much better ways to
ensure daemons you require are up).