[systemd-devel] is the watchdog useful?

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Tue Oct 22 09:17:42 UTC 2019


On Tue, Oct 22, 2019 at 09:54:31AM +0300, Pekka Paalanen wrote:
> On Mon, 21 Oct 2019 17:50:44 +0000
> Zbigniew Jędrzejewski-Szmek <zbyszek at in.waw.pl> wrote:
> 
> > In principle, the watchdog for services is nice. But in practice it seems
> > be bring only grief. The Fedora bugtracker is full of automated reports of ABRTs,
> > and of those that were fired by the watchdog, pretty much 100% are bogus, in 
> > the sense that the machine was resource starved and the watchdog fired.
> 
> Hi,
> 
> just curious, is that resource starvation caused by something big, e.g.
> a browser, using too much memory which leads to the kernel reclaiming
> also pages of program text sections because they can be reloaded from
> disk at any time, however those pages are needed again immediately
> after when some CPU core switches process context, leading to something
> that looks like a hard freeze to a user, while the kernel is furiously
> loading pages from disk just to drop them again, and can take from
> minutes to hours before any progress is visible?

I don't really know. Unfortunately, abrt in Fedora does not collect log
messages. In the old syslog days, a snippet of /var/log/messages for the
last 20 minutes or something like that before a crash would be copied
into the bug report, and this would include kernel messages about disk
errors, or kernel stalls, or other interesting hints. Unfortunately
nowadays, because of privacy concerns (?) and an effort to make things
more efficient (?), just some heavily-filtered journalctl output is
attached. In practice, usually this is at most a few lines and
completely useless. In particular, it does not give any hints to the
overall state of the system.

I have spoken to abrt maintainers about this, but it seems that this
problem is specific to systemd, and for most other applications it is
OK to get a backtrace without any system-wide context. So I don't see
this changing any time soon ;(

Sometimes I ask people for logs, and sometimes I get them, and in those
cases it seems that both hardware issues (e.g. a failing disk), or memory
exhaustion are often involved. In some cases there is no clear reason.
And since in the great majority we don't have any logs, it is hard to
say anything.

> It has happened to me on Fedora in the past. I could probably dig up
> discussions about the problem in general if you want, they explain it
> better than I ever could.
> 
> Does Fedora prevent that situation by tuning some kernel knobs nowadays
> for desktops?

I don't think so.

Zbyszek


More information about the systemd-devel mailing list