[systemd-devel] [PATCH v2 1/3] introduce WatchdogSec and hook up the watchdog with the existing failure logic

Thu Feb 9 01:21:33 PST 2012

On Wed, Feb 08, 2012 at 09:27:04PM +0100, Lennart Poettering wrote:
> On Wed, 08.02.12 10:10, Michael Olbrich (m.olbrich at pengutronix.de) wrote:
> > +                                <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
> > +                                regularly with "WATCHDOG=1". If the time
> > +                                between two such calls is larger than
> > +                                the configured time then the service
> > +                                enters a failure state. By setting
> > +                                <term><varname>Restart=</varname></term>
> > +                                to <option>on-failure</option> or
> > +                                <option>always</option> the service can
> > +                                be restarted. Defaults to 0s, which
> > +                                disables this feature.</para></listitem>
> 
> This makes me think that it might be a good idea to add a new restart
> setting for this: Restart=on-watchdog, for people who only want to
> restart on watchdog failures, but not otherwise? Do you think this could
> be useful to you? If so, happy to merge a patch for that.

Hmmm, I think Restart=on-unexpected-failure (= !SERVICE_SUCCESS &&
!SERVICE_FAILURE_EXIT_CODE) is probable more usefull.
In most cases we want to treat a crash the same way as a watchdog failure.
The only reasons not to restart, are "we're done" == exit(EXIT_SUCCESS) and
"we cannot recover without user intervention" == exit(EXIT_FAILURE).

Michael

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |