[systemd-devel] [PATCH v3 3/4] manager: add a global watchdog reboot timestamp

Wed Feb 1 11:22:29 PST 2012

On Wed, Feb 01, 2012 at 08:05:27PM +0100, Lennart Poettering wrote:
> On Wed, 01.02.12 17:17, Michael Olbrich (m.olbrich at pengutronix.de) wrote:
> 
> > This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
> > manager API. It contains the earliest point in time when systemd might
> > reboot the system because the timer for WatchdogRebootUSec for a
> > service expired.
> > If we assume the system takes Xus to shut down then
> > WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
> > daemon handling the hardware watchdog can use this information to
> > determine when to let the hardware watchdog restart the system.
> > This is convenience information for a watchdog daemon. With this the
> > it can avoid a lot of D-Bus calls that are necessary to calculate the
> > same value.
> 
> I had a longer discussion with Kay about this the other day, i.e. how we
> want the introduction of awatchdog look in the end. We kinda came to the
> formula that systemd should supervise services and the hw watchdog
> should supervise systemd. That way systemd would just write to the hw
> watchdog in its core event loop (i.e. in PID1) but not make any
> connection between the actual services and the hw watchdog. Now, your
> idea seems to be that watchdog acts as a multiplexer for the hw watchdog
> for services, right?
> 
> I am wondering now what the right way to handle this is in the
> end. Would it make sense to drop the multiplexing thing for you, or is
> there a strong case for doing this?

Well my use-cases come from embedded scenarios. Traditionally there was on
application and it handled the hardware watchdog by itself. Nowadays there
are multiple applications and all need to be supervised. Restarting an
application should be the first action to recover, so that the other
applications can continue uninterrupted. However it must be possible to
restart (if necessary triggered by the watchdog) the whole system, in case
an application cannot recover. I'm not sure how this can be achieved
without some kind of watchdog multiplexing.

Michael

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |