[systemd-devel] Zombie process still exists after stopping gdm.service

Daniel Drake drake at endlessm.com
Mon Apr 20 12:16:57 PDT 2015


On Mon, Apr 20, 2015 at 9:04 AM, Lennart Poettering
<lennart at poettering.net> wrote:
> maybe the main gdm process is not the one waiting, but a worker
> process is, and the main process kills the worker process without the
> worker process handling that nicely?

Not really. I removed all the process-killing code from gdm and the
problem is still there.

I have stepped through and I think that systemd is being too
aggressive. Still running with the default KillMode=cgroup, here is
what happens:

1. service_enter_stop() is entered which calls:
                service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS);

2. service_enter_signal sends SIGTERM to all gdm processes.

3. gdm simple-slave's signal handler triggers, which causes the
mainloop to exit, and it starts to kill and wait for the X server
death. I'm not exactly sure why, but quitting the glib mainloop also
causes the signal handler to be destroyed, so sigaction() is called
here to return SIGTERM to its default behaviour.

4. Moments later we arrive in systemd's service_sigchld_event(),
presumably because the main gdm process exited due to SIGTERM.
s->main_pid == pid. We respond as follows:

                        case SERVICE_STOP_SIGTERM:
                        case SERVICE_STOP_SIGKILL:
                                if (!control_pid_good(s))
                                        service_enter_stop_post(s, f);

5. Inside service_enter_stop post, there is no command to execute, so we call:
                service_enter_signal(s, SERVICE_FINAL_SIGTERM, SERVICE_SUCCESS);

6. service_enter_signal causes all remaining gdm processes to receive
SIGTERM again, only moments after the previous one. As gdm
simple-slave now has the default SIGTERM handler (instant death), it
dies, before it has finished the X server cleanup :(

7. To make things even worse, after sending the SIGTERMs,
service_enter_signal hits:
        } else if (state == SERVICE_FINAL_SIGTERM)
                service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS);

So, moments after sending 2 SIGTERMs, SIGKILL is sent to all gdm
processes. There does not seem to be any consideration of giving the
process some time to respond to SIGTERMs, nor the fact that I have
hacked gdm.service to have SendSIGKILL=no as an experiment.

Daniel


More information about the systemd-devel mailing list