[systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Fri Feb 20 07:08:32 PST 2015


On Fri, Feb 20, 2015 at 04:01:34PM +0100, Didier Roche wrote:
> Le 20/02/2015 15:41, Michael Biebl a écrit :
> >2015-02-20 15:36 GMT+01:00 Martin Pitt <martin.pitt at ubuntu.com>:
> >>Hello all,
> >>
> >>Since we updated to 219 in Ubuntu, several people reported boot
> >>failures. Booting hangs a long time after starting D-Bus, in the
> >>journal you get a lot of error messages like
> >>
> >>    systemd[1]: Failed to register match for Disconnected message: Connection timed out
> >>    systemd-logind[749]: Failed to fully start up daemon: Connection timed out
> >>    dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out
> >>
> >>polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi,
> >>etc.) to get stuck in an eternal retry loop.
> >>
> >>Unfortunately reproducing this is a real nuisance, classic heisenbug.
> >>I'm now able to trigger it (sometimes) in a VM, but I still haven't
> >>found a reliable recipe for reproducing it, so that bisecting just
> >>takes ages.
> >>
> >>I'm keeping debug log, notes, and progress in
> >>https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
> >>other distros in case they also get reports like this, to shortcut the
> >>debugging exercise (I already wasted 7 hours on this, and I'm not even
> >>close to the solution). Quite surprisingly it's somewhere in journald.
> >>Running 218 with journald from 219 causes the hang, 219 with journald
> >>from 218 is fine.
> >>
> >I noticed this as well. Interestingly, it only ever happened after
> >applying the fsckd patches  and running with plymouth enabled.
> >
> 
> We get it with "quiet splash" removed as well. It just that it's
> random on the machine load. We already ruled out the fsckd patch
> yesterday, but I did retry today again after this comment on my own
> vms and with systemd/udev/… ubuntu package "218-8ubuntu2" which
> doesn't contain the fsckd patch (and have the 218 and 219
> systemd-fsck writing to /dev/console) + systemd-journal binary
> copied from 219-1ubuntu1, I was able to reproduce the hang after 15
> boots.

Anything interesetingif you attach gdb to systemd-jouranld?Can you
paste bt and the Server variable (IIRC, it's *s in main)? How many
open fds does sd-journald have?

Zbyszek


More information about the systemd-devel mailing list