[systemd-bugs] [Bug 54712] RFE: Simplify watchdog configuration on Servers with IPMI compatible hardware

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Sep 13 01:15:33 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=54712

--- Comment #2 from Charles Rose <charles_rose at dell.com> 2012-09-13 08:15:33 UTC ---
(In reply to comment #1)
> (In reply to comment #0)
> > Watchdog hardware on servers can typically be configured in three ways:
> > 1. Configured via module parameters
...
> >     IPMI_WATCHDOG_OPTIONS="timeout=300 action=reset nowayout=0"
> 
> Which code will ping the hw in this case?

In all cases here, systemd would be the code to ping the hw watchdog. The
values to modprobe are more like the defaults.

> 
> Having init scripts that load kernel modules is something we really should try
> to avoid these days. Modules should be auto-loading depending on hw showing up.

I Agree. We are attempting autoload of IPMI:
    https://patchwork.kernel.org/patch/1243021/

> Which means the ipmi watchdog module should just be loaded like any other
> module if IPMI is available, and that makes configuration with a configuration
> file hard...
> 
> I am pretty sure IPMI watchdogs should probably be configured like any other,
> so I'd prefer if this IPMI-specific config would go away one day...

Yes. The ideal case would be where there are no config files to load ipmi or
ipmi_watchdog functionality, it all happens automatically. Only configurable
option at that time would be RuntimeWatchdogSec with just the timeout.

There are however some options unique to IPMI, like the 'action' parameter,
which can set the timeout action (reboot, shutdown, etc.). Defaults are
probably good for most cases.

> 
> > 2. Configured pre-boot
> > IPMI Watchdog hardware support out-of-band configuration (pre-OS). This is
> > useful where the system admin wants to configure watchdog on systems from a
> > pre-os configuration utility (like use factory set defaults) or remotely with
> > tools like bmc-watchdog(8) for hundreds of systems.
> 
> Which code is supposed to ping the hw in this case?

systemd.

User would set the timeout to 300s like this:
   # bmc-watchdog --set -i 300

This does not start the timer, just sets the timeout - this is done via
/dev/ipmi0 (not /dev/watchdog), so does not rely on the watchdog api and hence
does not imply a start timer.

The timer is started only on the first open() (from systemd). This is the
desired behaviour.

We want timeout and any other value the user would like to be set via tools
like bmc-watchdog/ipmitool/etc., but do the open() (and start the timer) from
systemd.

> 
> > 3. Configured via a watchdog daemon
> > Systemd's RuntimeWatchdogSec, bmc-watchdog(8) or watchdog(5)
> > 
...
> RuntimeWatchdogSec= has two purposes: configure the hw to some interval, and
> make systemd ping the hw in the right frequency. By default both are off. If
> you set the time setting then both are turned on. IIUC you want us to do the
> latter but not the former, right in IPMI setups? This has multiple problems,
> one of them being that right now we carefully made sure that people can choose
> any watchdog sw implementation they wish, but if we shall automatically detect
> a pre-initialized watchdog config and then make use of that we'd take
> possession when the user doesn't necessarily want us to. Also, this would
> require us to open the watchdog device first, to see what is configured, and if
> nothing is close it right-away again. However, that is problematic since some
> drivers (non IPMI...) don't allow us to close the watchdog device without
> triggering an immediate reboot. Hence automatically discovering a
> pre-initialized setting is problematic...

I agree.

Your proposal on the mail thread of an 'auto' option sounds like a reasonable
compromise.

   Auto: I want watchdog functionality, but not sure what the timeout is/should
be/take it from the driver.
   Flow: open(); GETTIMEOUT; if (!timeout) SETTIMEOUT

For the long term, if we can get ipmi_watchdog to autoload on hw detect, we can
have users set RuntimeWatchdogSec=auto or set a timeout value.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the systemd-bugs mailing list