[systemd-bugs] [Bug 76418] New: systemd unusable after segfault, zombies everywhere, unable to shutdown

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Mar 20 15:13:54 PDT 2014


https://bugs.freedesktop.org/show_bug.cgi?id=76418

          Priority: medium
            Bug ID: 76418
          Assignee: systemd-bugs at lists.freedesktop.org
           Summary: systemd unusable after segfault, zombies everywhere,
                    unable to shutdown
        QA Contact: systemd-bugs at lists.freedesktop.org
          Severity: major
    Classification: Unclassified
                OS: Linux (All)
          Reporter: lekensteyn at gmail.com
          Hardware: x86-64 (AMD64)
            Status: NEW
           Version: unspecified
         Component: general
           Product: systemd

When systemd (init!) 210-2 (Arch Linux x86_64) segfaulted some days ago, I was
brought to tty1. This happened out nothing, I was just reading and not touching
anything. After that segfault, systemd was totally unusable:

 - systemctl <stop | start | status> <anything> timed out
 - NetworkManager dispatcher services also timed out
 - unable to suspend (not by lid close, not by systemctl suspend, not by
suspend key)
 - unable to shutdown (systemctl poweroff; shutdown -h now)
 - Ignores the documented SIGRTMIN+4 signal to shutdown the machine.
 - Ignores SIGTERM, SIGKILL (ok), but after sending another SIGSEGV, I got a
kernel panic.
 - Zombies everywhere. When I was about to shutdown (panic), I got 3.2k zombie
processes.

I still have a tiny core dump, but without debugging symbols it is quite
useless. This report is not about that specific crash, but more about handling
segfaults in general.

init is supposed to be unkillable right? It ignores SIGTERM and SIGKILL... but
sending twice SIGSEGV results in a kernel panic because it killed itself.
sysvinit on Debian does not have this issue, when it receives a segfault, it
sleeps for 30 seconds, ignoring any signals. Due to its different architecture,
services can still be started and stopped.

What is the expected behavior:
systemd should handle SIGSEGV gracefully, especially since it can be triggered
by any root program. It should not let zombies walk over /proc/. It should not
make it impossible to start/stop/query services.

Reproduced with QEMU:

 0. Install Arch in QEMU, edit /etc/systemd/journald.conf, log to
/dev/ttyS1[1]. Edit /etc/default/grub, add `console=ttyS0 loglevel=7` to
cmdline.
 1. qemu-system-x86_64 -enable-kvm -hda arch.qcow2 -m 1G -serial file:dmesg.txt
-serial journal.txt
 2. tailf journal.txt
 3. kill -SEGV 1
 4. Observe the following in out.txt:

[  218.557179] systemd[1]: Caught <SEGV>, dumped core as pid 289.
[  218.558909] systemd[1]: Freezing execution.
[  218.567627] systemd-coredump[290]: Process 289 (systemd) dumped core.

 5. kill -SEGV 1
 6. Observe a kernel panic (VM frozen, journal.txt possible partially written).
dmesg.txt contains:

[  229.817252] systemd[1]: segfault at 7fff77d31e68 ip 00007ffb412e4fd0 sp
00007fff77d31e70 error 6 in libc-2.19.so[7ffb4129d000+19e000]
[  229.833598] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000008b
[  229.833598] 
[  229.834798] CPU: 0 PID: 1 Comm: systemd Not tainted 3.13.6-1-ARCH #1
[  229.835608] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  229.836353]  ffff880037a6a760 ffff880119a4fc90 ffffffff81513274
ffffffff81700080
[  229.836658]  ffff880119a4fd08 ffffffff8150fe3a ffff880100000010
ffff880119a4fd18
[  229.836658]  ffff880119a4fcb8 0000000000000282 000000000000008b
ffff880119a683c0
[  229.836658] Call Trace:
[  229.836658]  [<ffffffff81513274>] dump_stack+0x4d/0x6f
[  229.836658]  [<ffffffff8150fe3a>] panic+0xc8/0x1d7
[  229.836658]  [<ffffffff81064628>] do_exit+0xa78/0xa80
[  229.836658]  [<ffffffff810646af>] do_group_exit+0x3f/0xa0
[  229.836658]  [<ffffffff81073255>] get_signal_to_deliver+0x295/0x5f0
[  229.836658]  [<ffffffff81014498>] do_signal+0x48/0x950
[  229.836658]  [<ffffffff8140d0d0>] ? sockfd_lookup_light+0x20/0x80
[  229.836658]  [<ffffffff81014e08>] do_notify_resume+0x68/0xa0
[  229.836658]  [<ffffffff8151a37c>] retint_signal+0x48/0x8c

Reproducibility: 100%


Distro: Arch Linux x86_64
Kernel: Linux 3.14-rc5 (reproduced in 3.13.6-1-ARCH)
systemd: 211-2

 [1]:
https://wiki.archlinux.org/index.php/systemd#Forward_journald_to_.2Fdev.2Ftty12

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-bugs/attachments/20140320/28344ca4/attachment-0001.html>


More information about the systemd-bugs mailing list