[systemd-bugs] [Bug 54712] RFE: Simplify watchdog configuration on Servers with IPMI compatible hardware

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Sep 11 06:26:43 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=54712

--- Comment #1 from Lennart Poettering <lennart at poettering.net> 2012-09-11 13:26:43 UTC ---
(In reply to comment #0)
> Watchdog hardware on servers can typically be configured in three ways:
> 1. Configured via module parameters
> The OpenIPMI project contains a startup script that loads IPMI kernel modules
> during startup controlled by /etc/sysconfig/ipmi. This script can optionally
> load ipmi_watchdog.ko as well.
>     /etc/sysconfig/ipmi
>     IPMI_WATCHDOG=yes
>     IPMI_WATCHDOG_OPTIONS="timeout=300 action=reset nowayout=0"

Which code will ping the hw in this case?

Having init scripts that load kernel modules is something we really should try
to avoid these days. Modules should be auto-loading depending on hw showing up.
Which means the ipmi watchdog module should just be loaded like any other
module if IPMI is available, and that makes configuration with a configuration
file hard...

I am pretty sure IPMI watchdogs should probably be configured like any other,
so I'd prefer if this IPMI-specific config would go away one day...

> 2. Configured pre-boot
> IPMI Watchdog hardware support out-of-band configuration (pre-OS). This is
> useful where the system admin wants to configure watchdog on systems from a
> pre-os configuration utility (like use factory set defaults) or remotely with
> tools like bmc-watchdog(8) for hundreds of systems.

Which code is supposed to ping the hw in this case?

> 3. Configured via a watchdog daemon
> Systemd's RuntimeWatchdogSec, bmc-watchdog(8) or watchdog(5)
> 
> In scenarios #1 and #2, the timeout value is already set in the watchdog device
> (the timer is set to Stopped). But systemd does not currently probe/use this.
> 
> For such scenarios, it would be beneficial if systemd can first get the current
> timeout value (WDIOC_GETTIMEOUT), and if not set, only then set it to
> RuntimeWatchdogSec. This would ensure that timeout values set via other
> mechanisms still hold good and the system admin does not have to duplicate the
> timeout values in /etc/systemd/system.conf (especially for large number of
> systems, remotely).

RuntimeWatchdogSec= has two purposes: configure the hw to some interval, and
make systemd ping the hw in the right frequency. By default both are off. If
you set the time setting then both are turned on. IIUC you want us to do the
latter but not the former, right in IPMI setups? This has multiple problems,
one of them being that right now we carefully made sure that people can choose
any watchdog sw implementation they wish, but if we shall automatically detect
a pre-initialized watchdog config and then make use of that we'd take
possession when the user doesn't necessarily want us to. Also, this would
require us to open the watchdog device first, to see what is configured, and if
nothing is close it right-away again. However, that is problematic since some
drivers (non IPMI...) don't allow us to close the watchdog device without
triggering an immediate reboot. Hence automatically discovering a
pre-initialized setting is problematic...

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the systemd-bugs mailing list