[systemd-devel] octeon_wdt: WDT device closed unexpectedly

Lennart Poettering lennart at poettering.net
Fri Jan 15 05:24:22 PST 2016


On Fri, 15.01.16 04:42, Mike Cardillo (mcardill) (mcardill at cisco.com) wrote:

> HI all,
> 
> We’re seeing the following warning upon reboot of our machines: octeon_wdt: WDT device closed unexpectedly
> 
> I believe it’s caused from the following code block in src/core/main.c:
> 
>     if (arm_reboot_watchdog && arg_shutdown_watchdog > 0) {
>             char *e;
> 
>             /* If we reboot let's set the shutdown
>              * watchdog and tell the shutdown binary to
>              * repeatedly ping it */
>             r = watchdog_set_timeout(&arg_shutdown_watchdog);
>             watchdog_close(r < 0);
> 
>             /* Tell the binary how often to ping, ignore failure */
>             if (asprintf(&e, "WATCHDOG_USEC="USEC_FMT, arg_shutdown_watchdog) > 0)
>                     (void) strv_push(&env_block, e);
>     } else
>             watchdog_close(true);
> 
> It seems that if systemd is closing due to a reboot, the watchdog
> management is passed to systemd-shutdown, however we are using
> systemctl to reboot, so it seems the watchdog management cannot be
> properly passed to systemctl. Any recommendations on the best way to
> solve this issue and get rid of this message?

When you use systemctl to ask systemd/PID 1 to shutdown it will stop
all services, and then exec() the "systemd-shutdown" binary which will
become the new PID 1. That binary will then do some final clean-ups
and power off.

Before the exec() invocation we'll configure the hw watchdog to the
interval configured in ShutdownWatchdogSec= in
/etc/systemd/system.conf. The "systemd-shutdown" binary is supposed to
ping the hw watchdog often enough to not trigger this in the normal
case. However, it's supposed to be a safety net if this final phase of
the boot hangs for some reason (which it might, since we invoke
umount() from PID, which might cause freezing due NFS failures, or
similar reasons).

Some watchdog kernel drivers consider it a problem if we first set up
the watchdog, and then close the device, and warn about it. However,
this is really intended here, and actually explicit part of the kernel
watchdog interface (they have a "magic close logic" to deal with this
case).

Anyway, long story short: your kernel watchdog driver shouldn't warn
about this. It's mostly just a cosmetic problem, and you may usually
just ignore it.

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list