[systemd-devel] systemd hangs on shutdown

Daniel Drake dsd at laptop.org
Thu Sep 29 06:29:51 PDT 2011


On Wed, Sep 28, 2011 at 4:03 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> Hmm, so I don't see any obvious either.

I dug a little further. I have enabled lockdep to see if it can find
any deadlocks, and nothing comes up.

I also turned to sysrq-D (show locks held) - no locks are held at the
point of hang.

sysrq-T (show task state) shows the following userspace processes running:
systemd, udevd (several threads/instances), systemd-stdout-,
systemd-kmsg-sy, plymouthd. (they are all epolling stuff)

Can you confirm that at this point, those tasks should still be
running (and should not have been killed by the mass kill of processes
that just happened)? Just wondering if systemd is waiting for them to
die, or is waiting on some other condition.


Another observation: after waiting several minutes,
systemd-stdout-syslog-bridge.service generates a notification message
and seems to nudge systemd a little closer to completing the shutdown
routine (please see 387.600571 in the full logs below).

Full log of startup, shutdown, and sysrq dumps at point of first hang
(before systemd-stdout-syslog-bridge.service wakeup), and second hang,
then sysrq dumps again:
http://dev.laptop.org/~dsd/20110929/systemd-shutdown-hang-debug.txt


(Just FYI: A couple of things have changed since the last mails: I
updated to systemd-36 and I updated the kernel to linus master, and
patched in https://lkml.org/lkml/2011/8/9/318 which fixed warnings
caused when I enabled lockdep)

> It might make sense to figure out which of the services hang on
> shutdown. Starting at 60.141079 you'll find the jobs that systemd
> determines it needs to execute for shut down. Each time one of these
> jobs is finished you see this logged, for example in 87.920164. It would
> make sense to see which jobs from the initial transaction never are
> finished, i.e. never get the matching "Job ... finished" message.

OK, done this for the new log. here are my observations (sorry if any
of them are stupid!):

NetworkManager job is marked done at 60.850964 but logs messages afterwards?

Weird ordering of messages here:
[   61.831071] systemd[1]: Child 1464 belongs to var-lib-random\x2dseed.mount
[   61.837978] systemd[1]: var-lib-random\x2dseed.mount mount process
exited, code=exited status=0
[   61.847778] systemd[1]: var-lib-random\x2dseed.mount changed
unmounting -> dead
[   61.890550] systemd[1]: Job var-lib-random\x2dseed.mount/stop
finished, result=done
snip
[   62.298212] systemd[1]: Got SIGCHLD for process 1477 (umount)
[   62.305426] systemd[1]: Child 1477 died (code=exited, status=0/SUCCESS)
[   62.312138] systemd[1]: Child 1477 belongs to var-lib-dbus.mount
[   62.318182] systemd[1]: var-lib-dbus.mount mount process exited,
code=exited status=0
[   62.327267] systemd[1]: var-lib-dbus.mount changed unmounting -> dead
[   62.370552] systemd[1]: Job var-lib-dbus.mount/stop finished, result=done
snip
[   62.955961] systemd[1]: var-lib-random\x2dseed.mount changed dead -> mounted
[   62.967152] systemd[1]: var-lib-dbus.mount changed dead -> mounted

Why do they get marked as mounted after being unmounted?
The other similar mounts all got 'collected', these 2 did not.


Is it normal for the systemd-kmsg-syslogd service to be *started*
quite late in the shutdown process?

[   63.062468] systemd[1]: Incoming traffic on syslog.socket
[   63.067936] systemd[1]: Trying to enqueue job
systemd-kmsg-syslogd.service/start/replace
[   63.076120] systemd[1]: Installed new job
systemd-kmsg-syslogd.service/start as 343
[   63.083842] systemd[1]: Enqueued job
systemd-kmsg-syslogd.service/start as 343
[   63.091135] systemd[1]: syslog.socket changed listening -> running
[   63.097566] systemd[1]: About to execute: /lib/systemd/systemd-kmsg-syslogd
[   63.140299] systemd[1]: Forked /lib/systemd/systemd-kmsg-syslogd as 1510
[   63.148024] systemd[1]: systemd-kmsg-syslogd.service changed dead -> running

Every single "Installed new job" line has a corresponding "Job foo
finished, result=done" line (I checked carefully).

However, there are some other inconsistencies (not sure how relevant):

Some services have a "cgroup is empty" message logged before or after
they are marked as done (such as prefdm.service, avahi-daemon.service_
bu some services have no such line logged (e.g. sshd-keygen.service,
console-kit-daemon.service)

Some mounts/services are "collected" (e.g. console-kit-daemon.service,
var-spool.mount) and others are not (e.g. crond.service,
var-log.mount, var-lib-dbus.mount)

I can't see anything that tries to shut down
systemd-stdout-syslog-bridge.service. Is that normal? Might systemd be
waiting for it?

Thats all I see.

> What's going on with those "serial8250: too much work for irq4" message?
> Note  that by default systemd will log short pretty status messages to
> /dev/console. If that points to your serial driver and that driver is
> hosed then maybe that causes systemd to freeze?

Not sure. Its the standard x86 legacy serial driver - nothing special.
The default console is indeed a serial console (thats how I'm
logging). Those messages go away if I use tty0 as the default console
(but the hang doesn't). And in fact they don't always appear (e.g. no
occurrences in the log referenced in this mail).


Any new ideas?

Thanks a lot for your help on this.
Daniel


More information about the systemd-devel mailing list