[systemd-devel] Zombie process still exists after stopping gdm.service

Lennart Poettering lennart at poettering.net
Mon Apr 20 17:04:51 PDT 2015


On Mon, 20.04.15 13:16, Daniel Drake (drake at endlessm.com) wrote:

> On Mon, Apr 20, 2015 at 9:04 AM, Lennart Poettering
> <lennart at poettering.net> wrote:
> > maybe the main gdm process is not the one waiting, but a worker
> > process is, and the main process kills the worker process without the
> > worker process handling that nicely?
> 
> Not really. I removed all the process-killing code from gdm and the
> problem is still there.
> 
> I have stepped through and I think that systemd is being too
> aggressive. Still running with the default KillMode=cgroup, here is
> what happens:
> 
> 1. service_enter_stop() is entered which calls:
>                 service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS);
> 
> 2. service_enter_signal sends SIGTERM to all gdm processes.

No, if you use KillMode=mixed (as you say you do) it will only send
SIGTERM to the main process of gdm.

> 3. gdm simple-slave's signal handler triggers, which causes the
> mainloop to exit, and it starts to kill and wait for the X server
> death. I'm not exactly sure why, but quitting the glib mainloop also
> causes the signal handler to be destroyed, so sigaction() is called
> here to return SIGTERM to its default behaviour.
> 
> 4. Moments later we arrive in systemd's service_sigchld_event(),
> presumably because the main gdm process exited due to SIGTERM.
> s->main_pid == pid. 

If PID 1 gets the SIGCHLD for the main process then it assumes the
service has finished correctly, and will kill the rest that might remain.

> 7. To make things even worse, after sending the SIGTERMs,
> service_enter_signal hits:
>         } else if (state == SERVICE_FINAL_SIGTERM)
>                 service_enter_signal(s, SERVICE_FINAL_SIGKILL,
> SERVICE_SUCCESS);

Hmm? if we managed to kill something we'll arm the timeout and wait
for sigchld or cgroup empty or similar.

These shortcuts only take place if we couldn't kill anything because
there was nothing. And hence the second killing will have no effect
either, but at least we go through the state engine...

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list