[systemd-devel] is the watchdog useful?
Vito Caputo
vcaputo at pengaru.com
Fri Oct 25 18:16:07 UTC 2019
On Fri, Oct 25, 2019 at 10:26:44AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
> On Thu, Oct 24, 2019 at 02:56:55PM -0700, Vito Caputo wrote:
> > On Thu, Oct 24, 2019 at 10:45:32AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
> > > On Tue, Oct 22, 2019 at 04:35:13AM -0700, Vito Caputo wrote:
> > > > On Tue, Oct 22, 2019 at 10:51:49AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
> > > > > On Tue, Oct 22, 2019 at 12:34:45PM +0200, Umut Tezduyar Lindskog wrote:
> > > > > > I am curious Zbigniew of how you find out if the coredump was on a starved
> > > > > > process?
> > > > >
> > > > > A very common case is systemd-journald which gets SIGABRT when in a
> > > > > read() or write() or similar syscall. Another case is when
> > > > > systemd-udevd workers get ABRT when doing open() on a device.
> > > > >
> > > >
> > > > In the case of journald, is it really in read()/write() syscalls you're
> > > > seeing the SIGABRTs?
> > >
> > > I was sloppy here — it's not read/write, but various other syscalls.
> > > In particular clone(), which makes sense, because it involves memory
> > > allocation.
> > >
> >
> > That's interesting, it's not like journald calls clone() a lot.
>
> Hm, maybe it was udevd that was calling clone(), not journald.
> All the reports are available here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1300212
>
Yeah, nearly all the 2019 reports you marked as duplicate were
udev/logind, at least of the ones I could access. Surprisingly few were
journald related at all, are those the ones marked private?
> I opened a pull request to make the watchdog setting configurable
> for our own internal services: https://github.com/systemd/systemd/pull/13843.
>
It will be interesting to see how many aborts continue to come in with
the timeout @ 1h.
Seeing aborts in epoll_wait() suggests these processes either weren't
getting scheduled to run for 3 minutes despite having watchdog events to
service on an fd, or were paged out and took longer than 3 minutes to
page in and actually run.
If it's the latter, maybe we can lock some of these in memory in
response to another build-time configuration. Desktop distros can spare
the memory for logind to not get paged out for instance. I know I
wouldn't object to having the drm arbiter process always resident on my
machine, it's pretty close to being a kernel component in that role.
Regards,
Vito Caputo
More information about the systemd-devel
mailing list