[systemd-devel] is the watchdog useful?
Zbigniew Jędrzejewski-Szmek
zbyszek at in.waw.pl
Mon Oct 21 17:50:44 UTC 2019
In principle, the watchdog for services is nice. But in practice it seems
be bring only grief. The Fedora bugtracker is full of automated reports of ABRTs,
and of those that were fired by the watchdog, pretty much 100% are bogus, in
the sense that the machine was resource starved and the watchdog fired.
There a few downsides to the watchdog killing the service:
1. if it is something like logind, it is possible that it will cause user-visible
failure of other services
2. restarting of the service causes additional load on the machine
3. coredump handling causes additional load on the machine, quite significant
4. those failures are reported in bugtrackers and waste everyone's time.
I had the following ideas:
1. disable coredumps for watchdog abrts: systemd could set some flag
on the unit or otherwise notify systemd-coredump about this, and it could just
log the occurence but not dump the core file.
2. generally disable watchdogs and make them opt in. We have 'systemd-analyze service-watchdogs',
and we could make the default configurable to "yes|no".
What do you think?
Zbyszek
More information about the systemd-devel
mailing list