[systemd-devel] Default on failure dependencies

Fri Oct 5 17:52:53 UTC 2018

On Sa, 15.09.18 22:32, Baudouin Feildel (baudouin_systemd at feildel.fr) wrote:

(Sorry for not responding more timely, I have been travelling and am
still catching up with all the email)

> Hello there,
> 
> Few weeks ago I opened the following issue in systemd repository:
> https://github.com/systemd/systemd/issues/9373. Seeing no traction from
> existing systemd developer,

Hmm, so, I figure we should have a discussion whether this really is
desirable first, because I am not too sure about that I must say.

So far we are very conservative when it comes to options that are
supposed to affect all units at once, as that tends to create various
problems that are not obvious to solve. For example, if every service
gets this kind of dep, what about the units that these deps are
supposed to start, do you create a cyclic dep there?

Moreover, I figure the services pulled in like this are usually going
to be late boot processes, but this means failures during early boot
would result in a large number of queued services that need to be
dispatched during late boot.

Moreover what happens if a service fails multiple times during early
boot (for example because Restart= is used)? What happens with these
failures, are the earlier ones dropped?

Also, what happens for services that fail during shutdown, would these
also pull in new units? But if they do, then this would result in
cyclic operations if the service to run is a regular service,
i.e. needs all basic system stuff up: we are shutting down, but in
order to process evreything that happened then we need to start
services that reverse the shut down process as they require certain
stuff to be up...

In general, there's the "philosophical incompatibility": stuff that
is supposed to process failures in the service dependency logic,
should probably not be part of the service dependency logic itself.

This all makes me wonder whether a different approach to all of this
wouldn't be better: maybe we should just consider this a logging
problem: let's make sure we log a recognizable log message (i.e. a
structured journal message with a well-defined MESSAGE_ID=) whenever a
service fails. With that in place it should be relatively easy to
write a system service that can run during regular system uptime and
can look in the journal for all failures, including getting live
notifications when something happens. Moreover, this resolves the
problems during early and late boot: the "cursor" logic of the journal
allows such a service to know exactly which failures it already
processed and which ones are still left, and it can process all
failures that took place while it was not running.

Does that make sense?

Lennart

-- 
Lennart Poettering, Red Hat