<html>
    <head>
      <base href="https://bugs.freedesktop.org/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - systemd unusable after segfault, zombies everywhere, unable to shutdown"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=76418">76418</a>
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>systemd-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>systemd unusable after segfault, zombies everywhere, unable to shutdown
          </td>
        </tr>

        <tr>
          <th>QA Contact</th>
          <td>systemd-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>major
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux (All)
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>lekensteyn@gmail.com
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86-64 (AMD64)
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>general
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>systemd
          </td>
        </tr></table>
      <p>
        <div>
        <pre>When systemd (init!) 210-2 (Arch Linux x86_64) segfaulted some days ago, I was
brought to tty1. This happened out nothing, I was just reading and not touching
anything. After that segfault, systemd was totally unusable:

 - systemctl <stop | start | status> <anything> timed out
 - NetworkManager dispatcher services also timed out
 - unable to suspend (not by lid close, not by systemctl suspend, not by
suspend key)
 - unable to shutdown (systemctl poweroff; shutdown -h now)
 - Ignores the documented SIGRTMIN+4 signal to shutdown the machine.
 - Ignores SIGTERM, SIGKILL (ok), but after sending another SIGSEGV, I got a
kernel panic.
 - Zombies everywhere. When I was about to shutdown (panic), I got 3.2k zombie
processes.

I still have a tiny core dump, but without debugging symbols it is quite
useless. This report is not about that specific crash, but more about handling
segfaults in general.

init is supposed to be unkillable right? It ignores SIGTERM and SIGKILL... but
sending twice SIGSEGV results in a kernel panic because it killed itself.
sysvinit on Debian does not have this issue, when it receives a segfault, it
sleeps for 30 seconds, ignoring any signals. Due to its different architecture,
services can still be started and stopped.

What is the expected behavior:
systemd should handle SIGSEGV gracefully, especially since it can be triggered
by any root program. It should not let zombies walk over /proc/. It should not
make it impossible to start/stop/query services.

Reproduced with QEMU:

 0. Install Arch in QEMU, edit /etc/systemd/journald.conf, log to
/dev/ttyS1[1]. Edit /etc/default/grub, add `console=ttyS0 loglevel=7` to
cmdline.
 1. qemu-system-x86_64 -enable-kvm -hda arch.qcow2 -m 1G -serial file:dmesg.txt
-serial journal.txt
 2. tailf journal.txt
 3. kill -SEGV 1
 4. Observe the following in out.txt:

[  218.557179] systemd[1]: Caught <SEGV>, dumped core as pid 289.
[  218.558909] systemd[1]: Freezing execution.
[  218.567627] systemd-coredump[290]: Process 289 (systemd) dumped core.

 5. kill -SEGV 1
 6. Observe a kernel panic (VM frozen, journal.txt possible partially written).
dmesg.txt contains:

[  229.817252] systemd[1]: segfault at 7fff77d31e68 ip 00007ffb412e4fd0 sp
00007fff77d31e70 error 6 in libc-2.19.so[7ffb4129d000+19e000]
[  229.833598] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000008b
[  229.833598] 
[  229.834798] CPU: 0 PID: 1 Comm: systemd Not tainted 3.13.6-1-ARCH #1
[  229.835608] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  229.836353]  ffff880037a6a760 ffff880119a4fc90 ffffffff81513274
ffffffff81700080
[  229.836658]  ffff880119a4fd08 ffffffff8150fe3a ffff880100000010
ffff880119a4fd18
[  229.836658]  ffff880119a4fcb8 0000000000000282 000000000000008b
ffff880119a683c0
[  229.836658] Call Trace:
[  229.836658]  [<ffffffff81513274>] dump_stack+0x4d/0x6f
[  229.836658]  [<ffffffff8150fe3a>] panic+0xc8/0x1d7
[  229.836658]  [<ffffffff81064628>] do_exit+0xa78/0xa80
[  229.836658]  [<ffffffff810646af>] do_group_exit+0x3f/0xa0
[  229.836658]  [<ffffffff81073255>] get_signal_to_deliver+0x295/0x5f0
[  229.836658]  [<ffffffff81014498>] do_signal+0x48/0x950
[  229.836658]  [<ffffffff8140d0d0>] ? sockfd_lookup_light+0x20/0x80
[  229.836658]  [<ffffffff81014e08>] do_notify_resume+0x68/0xa0
[  229.836658]  [<ffffffff8151a37c>] retint_signal+0x48/0x8c

Reproducibility: 100%


Distro: Arch Linux x86_64
Kernel: Linux 3.14-rc5 (reproduced in 3.13.6-1-ARCH)
systemd: 211-2

 [1]:
<a href="https://wiki.archlinux.org/index.php/systemd#Forward_journald_to_.2Fdev.2Ftty12">https://wiki.archlinux.org/index.php/systemd#Forward_journald_to_.2Fdev.2Ftty12</a></pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are the QA Contact for the bug.</li>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>