[systemd-devel] is the watchdog useful?

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Tue Oct 22 10:51:49 UTC 2019


On Tue, Oct 22, 2019 at 12:34:45PM +0200, Umut Tezduyar Lindskog wrote:
> I am curious Zbigniew of how you find out if the coredump was on a starved
> process?

A very common case is systemd-journald which gets SIGABRT when in a
read() or write() or similar syscall. Another case is when
systemd-udevd workers get ABRT when doing open() on a device.

> This is common for our embedded devices. I didn't think it is common for
> desktop too.


> It is really useful for getting coredumps on deadlocked applications. For
> that reason I don't think it is good to remove this functionality
> completely.

Yes, I never suggested removing it completely. I'm just saying that for
the type of systems that Fedora targets, I don't recall any actual deadlock.
For more specialized systems, where the workload is more predictable,
it makes sense to have the watchdog.

There might be cases where the kernel is dead-locked internally, and e.g.
open() or modprobe() never returns. For those cases it might be useful to
get the backtrace, but actually killing the process and/or storing the
coredump is useful.

Zbyszek

> 
> Umut
> 
> On Mon, Oct 21, 2019 at 7:51 PM Zbigniew Jędrzejewski-Szmek <
> zbyszek at in.waw.pl> wrote:
> 
> > In principle, the watchdog for services is nice. But in practice it seems
> > be bring only grief. The Fedora bugtracker is full of automated reports of
> > ABRTs,
> > and of those that were fired by the watchdog, pretty much 100% are bogus,
> > in
> > the sense that the machine was resource starved and the watchdog fired.
> >
> > There a few downsides to the watchdog killing the service:
> > 1. if it is something like logind, it is possible that it will cause
> > user-visible
> > failure of other services
> > 2. restarting of the service causes additional load on the machine
> > 3. coredump handling causes additional load on the machine, quite
> > significant
> > 4. those failures are reported in bugtrackers and waste everyone's time.
> >
> > I had the following ideas:
> > 1. disable coredumps for watchdog abrts: systemd could set some flag
> > on the unit or otherwise notify systemd-coredump about this, and it could
> > just
> > log the occurence but not dump the core file.
> > 2. generally disable watchdogs and make them opt in. We have
> > 'systemd-analyze service-watchdogs',
> > and we could make the default configurable to "yes|no".
> >
> > What do you think?
> > Zbyszek
> > _______________________________________________
> > systemd-devel mailing list
> > systemd-devel at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/systemd-devel


More information about the systemd-devel mailing list