[systemd-devel] Later activation of the HW watchdog
Jan Kundrát
jan.kundrat at cesnet.cz
Thu Jun 14 17:10:36 UTC 2018
On úterý 24. října 2017 17:10:39 CEST, Jan Kundrát wrote:
> Hi,
> is it possible to change systemd's global settings for
> RuntimeWatchdogSec at runtime? I would like to have the early
> boot "guarded" by the HW watchdog started by my platform code,
> and for systemd to take over only after a certain target has
> been reached. I was thinking about an extra unit which simply
> writes an appropriate config file, but the docs for `systemctl
> daemon-reload` or `daemon-reexec` do not talk about these
> top-level settins. How do I tell systemd to notice a new value?
>
> Context: I'm using systemd on an embedded ARM box with reliable
> network connectivity. The system has two fully separate
> rootfs/kernel/devicetree instances, A and B. The bootloader
> starts a HW watchdog timer, and the bootloader keeps a counter
> tracking of how many times a particular A/B "boot slot"
> attempted to boot. The kernel ignores the watchdog, and once
> systemd gets launched and checks it system.conf file, it
> proceeds to re-start the WD timer periodically. Finally, a unit
> which is pulled in by my default target updates the bootloader's
> environment, resetting the boot counter.
>
> My goal is to be able to boot a possibly broken image (but not
> a malicious one, of course) without fearing that it's going to
> lock me out of my device. If the new image "fails" for some
> reason, I epxect the HW watchdog to reset the system, the boot
> attempt counter to eventually reach zero, and the whole system
> to roll-back to the previous image, eventually. In my scneario,
> it's preferred to make the decision to reboot rather than
> waiting for human interaction for solving the actual problem.
> The once-failed slot can be re-flahed very cheapily, and an
> updated version can be re-tried during the next update attempt.
>
> During my testing, I was able to unplug the system's SD card at
> a "wrong" moment which resulted in systemd trying to boot into
> emergency.target and ultimately failing due to a missing rootfs.
> I ended up with an unusable system which did not reboot
> automatically because systemd was periodically pinging the HW
> watchdog timer. [1]
>
> I got a suggestion to adjust the important units so that they
> specify a FailureAction. I do not like that solution because it
> is additional work (identifying which units might fail, coming
> up with various possible failing scenarios, being hard to test
> and get "right" in face of systemd updates in future, etc). It
> also feels like I am attacking a wrong problem. I already *have*
> a watchdog which will shoot the system into the head if
> something wrong happens. Wouldn't it make more sense to rely on
> this piece of infrastructure and start telling the watchdog
> "hey, I'm OK" only after the system has fuly booted and my
> ultimate target has been *reached*?
>
> SUggestions which offer additional possibilities are welcome. I
> like system'd feature set, and I won't pretend that I know all
> of them :).
>
> With kind regards,
> Jan
>
> [1] https://github.com/systemd/systemd/issues/7063
I more or less solved this by *not* configuring systemd to start pinging
the watchdog on its own. Then I added another unit depending on and being
wanted by multi-user.target which checks whether everything is OK so far:
[Unit]
Description=Pinging the HW watchdog
Requires=multi-user.target
After=multi-user.target
[Service]
Type=oneshot
ExecStartPre=/bin/sh -c '[ "$(/bin/systemctl list-units --failed --all
--no-legend --no-pager)" == "" ]'
ExecStart=/bin/busctl set-property org.freedesktop.systemd1
/org/freedesktop/systemd1 org.freedesktop.systemd1.Manager
RuntimeWatchdogUSec t 30000000
For more details, see the original bugreport at
https://github.com/systemd/systemd/issues/7063 .
Cheers,
Jan
More information about the systemd-devel
mailing list