[systemd-devel] [PATCH 1/3] introduce WatchdogSec and hook up the watchdog with the existing failure logic

Michael Olbrich m.olbrich at pengutronix.de
Tue Feb 7 04:18:11 PST 2012


Hi,

I've been thinking a bit more about this. There are some problems with
this. Consider a service that works like this:

do_setup()
sd_notify("READY=1")
while (true) {
	do_work()
	sd_notify("WATCHDOG=1")
}

Now the error is, that do_work() never finishes, even after restart.
First the service is restarted. That works (we get "READY=1"). The old
timeout is still there, but the watchdog timer is never started and no
further action is taken. Not good.
I see multiple options here:

1. Clear the timeout on restart, and require the service to send
   "WATCHDOG=1" when started.
2. Consider "READY=1" (and any other startup notification) as the first
   "WATCHDOG=1", when WatchdogSec ist set.
3. Same as 2. but only when restarting and at least on "WATCHDOG=1" was
   received before.
4. Add a config option for 1., 2., 3.

I don't like 1.: For services with socket or D-Bus interfaces I'd like to
make it possible to send "WATCHDOG=1" from a separate process. Fork it at
startup and access the service via its actual interface and send
"WATCHDOG=1" as appropriate. This makes it possible to monitor services
that do not support it, without code modifications.
The startup notification comes from the actual service (e.g. via BusName)
and cannot be combined with the first "WATCHDOG=1".

2. might be a bit too restrictive for more relaxed scenarios, especially
when combined with rebooting.

And 3. is just the opposite. It's not enough for critical services. Also, I
think there is a bit too much magic here for my taste.

I'd probably prefer 4. with with an option to select between 1. and 2.

Comments?

Regards,
Michael

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


More information about the systemd-devel mailing list