[systemd-devel] [PATCH 0/4] systemd and watchdog

Michael Olbrich m.olbrich at pengutronix.de
Wed Sep 28 09:59:14 PDT 2011


Hi,

I've been playing with some ideas on how to add watchdog support to
systemd. I don't like talking about vaporware so here are some patches with
a prototype implementation. It should give you an idea on how this could be
done.

A few words on the ideas behind this: When working with a watchdog in
Linux, the typical scenario is one hardware watchdog, but multiple
processes that should be monitored. Beyond that the hardware watchdog
should be the last line of defence. A more graceful recovery should be
tried first.

How to implement this is systemd:
systemd already has the concept of a state for each service and a very
simple method (sd_notify) for the service to provide status information to
systemd.
This is implemented in the first patch. A service can send keep-alive
messages with sd_notify, and the timestamp of the latest message is exposed
as a service property.

The second patch implements service restart / reboot when no keep-alive
message was received for a certain amount of time.
Note: This only triggers if at least one keep-alive was received. I don't
think anything can be done if a service fails to start. This should be
handled outside of systemd.

I think, the watchdog hardware should be handled in a separate service, for
several reasons:
- It's not useful on systems without watchdog hardware. This gives us a
  clean way to disable it.
- This is a rather critical part to implement. The code is much simpler
  this way.
- There are many different requirements and options on how to handle the
  watchdog hardware. It's a lot easier to replace a separate daemon with a
  custom implementation, should it be necessary.

The third patch is helper code. It provides a single time stamp for when
systemd will reboot if no more keep-alive are sent.
This way the watchdog service only needs to make one D-Bus call to get the
necessary data.
The last patch adds a simple daemon that handled the watchdog device.

What do you think?

Regards,
Michael


Michael Olbrich (4):
  WIP: service: add watchdog timestamp
  WIP: service: add watchdog restart/reboot timeouts
  WIP: manager: add a global watchdog reboot timestamp
  WIP: add basic watchdog daemon

 Makefile.am                        |   21 ++++++-
 src/99-systemd.rules.in            |    2 +
 src/dbus-manager.c                 |    4 +
 src/dbus-service.c                 |    8 +++
 src/load-fragment-gperf.gperf.m4   |    2 +
 src/manager.c                      |   20 ++++++
 src/manager.h                      |    3 +
 src/service.c                      |   49 +++++++++++++++
 src/service.h                      |    6 ++
 src/watchdogd.c                    |  119 ++++++++++++++++++++++++++++++++++++
 units/systemd-watchdogd.service.in |   16 +++++
 11 files changed, 248 insertions(+), 2 deletions(-)
 create mode 100644 src/watchdogd.c
 create mode 100644 units/systemd-watchdogd.service.in

-- 
1.7.5.4



More information about the systemd-devel mailing list