[systemd-devel] [PATCH 0/4] systemd and watchdog

Albert Strasheim albert.strasheim at gmail.com
Wed Sep 28 10:16:40 PDT 2011


Hello

On Wed, Sep 28, 2011 at 6:59 PM, Michael Olbrich
<m.olbrich at pengutronix.de> wrote:
> How to implement this is systemd:
> systemd already has the concept of a state for each service and a very
> simple method (sd_notify) for the service to provide status information to
> systemd.
> This is implemented in the first patch. A service can send keep-alive
> messages with sd_notify, and the timestamp of the latest message is exposed
> as a service property.

Very cool. I've been wondering how we could restart services that hang
(e.g., deadlock or go into infinite loops) but don't crash.

> The second patch implements service restart / reboot when no keep-alive
> message was received for a certain amount of time.
> Note: This only triggers if at least one keep-alive was received. I don't
> think anything can be done if a service fails to start. This should be
> handled outside of systemd.

A question at this point: are ExecStartPosts executed if a service
fails? If they are, and if they can obtain the main exit status (if
that's a well-defined concept), they could take further action.

> I think, the watchdog hardware should be handled in a separate service, for
> several reasons:

Agreed. We've had good results with an IPMI watchdog and Fedora's
watchdog package. I think it might even include a .service file, or
maybe I wrote a simple one.

Regards

Albert


More information about the systemd-devel mailing list