[systemd-devel] [PATCH 0/4] systemd and watchdog
Michael Olbrich
m.olbrich at pengutronix.de
Wed Sep 28 09:59:14 PDT 2011
Hi,
I've been playing with some ideas on how to add watchdog support to
systemd. I don't like talking about vaporware so here are some patches with
a prototype implementation. It should give you an idea on how this could be
done.
A few words on the ideas behind this: When working with a watchdog in
Linux, the typical scenario is one hardware watchdog, but multiple
processes that should be monitored. Beyond that the hardware watchdog
should be the last line of defence. A more graceful recovery should be
tried first.
How to implement this is systemd:
systemd already has the concept of a state for each service and a very
simple method (sd_notify) for the service to provide status information to
systemd.
This is implemented in the first patch. A service can send keep-alive
messages with sd_notify, and the timestamp of the latest message is exposed
as a service property.
The second patch implements service restart / reboot when no keep-alive
message was received for a certain amount of time.
Note: This only triggers if at least one keep-alive was received. I don't
think anything can be done if a service fails to start. This should be
handled outside of systemd.
I think, the watchdog hardware should be handled in a separate service, for
several reasons:
- It's not useful on systems without watchdog hardware. This gives us a
clean way to disable it.
- This is a rather critical part to implement. The code is much simpler
this way.
- There are many different requirements and options on how to handle the
watchdog hardware. It's a lot easier to replace a separate daemon with a
custom implementation, should it be necessary.
The third patch is helper code. It provides a single time stamp for when
systemd will reboot if no more keep-alive are sent.
This way the watchdog service only needs to make one D-Bus call to get the
necessary data.
The last patch adds a simple daemon that handled the watchdog device.
What do you think?
Regards,
Michael
Michael Olbrich (4):
WIP: service: add watchdog timestamp
WIP: service: add watchdog restart/reboot timeouts
WIP: manager: add a global watchdog reboot timestamp
WIP: add basic watchdog daemon
Makefile.am | 21 ++++++-
src/99-systemd.rules.in | 2 +
src/dbus-manager.c | 4 +
src/dbus-service.c | 8 +++
src/load-fragment-gperf.gperf.m4 | 2 +
src/manager.c | 20 ++++++
src/manager.h | 3 +
src/service.c | 49 +++++++++++++++
src/service.h | 6 ++
src/watchdogd.c | 119 ++++++++++++++++++++++++++++++++++++
units/systemd-watchdogd.service.in | 16 +++++
11 files changed, 248 insertions(+), 2 deletions(-)
create mode 100644 src/watchdogd.c
create mode 100644 units/systemd-watchdogd.service.in
--
1.7.5.4
More information about the systemd-devel
mailing list