[systemd-devel] is the watchdog useful?

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Mon Oct 21 17:50:44 UTC 2019


In principle, the watchdog for services is nice. But in practice it seems
be bring only grief. The Fedora bugtracker is full of automated reports of ABRTs,
and of those that were fired by the watchdog, pretty much 100% are bogus, in 
the sense that the machine was resource starved and the watchdog fired.

There a few downsides to the watchdog killing the service:
1. if it is something like logind, it is possible that it will cause user-visible
failure of other services
2. restarting of the service causes additional load on the machine
3. coredump handling causes additional load on the machine, quite significant
4. those failures are reported in bugtrackers and waste everyone's time.

I had the following ideas:
1. disable coredumps for watchdog abrts: systemd could set some flag
on the unit or otherwise notify systemd-coredump about this, and it could just
log the occurence but not dump the core file.
2. generally disable watchdogs and make them opt in. We have 'systemd-analyze service-watchdogs',
and we could make the default configurable to "yes|no".

What do you think?
Zbyszek


More information about the systemd-devel mailing list