[systemd-devel] Zombie process still exists after stopping gdm.service

Daniel Drake drake at endlessm.com
Mon Apr 20 17:13:33 PDT 2015


On Mon, Apr 20, 2015 at 6:04 PM, Lennart Poettering
<lennart at poettering.net> wrote:
>> I have stepped through and I think that systemd is being too
>> aggressive. Still running with the default KillMode=cgroup, here is
>> what happens:
>>
>> 1. service_enter_stop() is entered which calls:
>>                 service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS);
>>
>> 2. service_enter_signal sends SIGTERM to all gdm processes.
>
> No, if you use KillMode=mixed (as you say you do) it will only send
> SIGTERM to the main process of gdm.

Only bleeding edge gdm has KillMode=mixed. I'm using a slightly older
version which has the default KillMode=cgroup. Sorry for the
confusion.

>> 3. gdm simple-slave's signal handler triggers, which causes the
>> mainloop to exit, and it starts to kill and wait for the X server
>> death. I'm not exactly sure why, but quitting the glib mainloop also
>> causes the signal handler to be destroyed, so sigaction() is called
>> here to return SIGTERM to its default behaviour.
>>
>> 4. Moments later we arrive in systemd's service_sigchld_event(),
>> presumably because the main gdm process exited due to SIGTERM.
>> s->main_pid == pid.
>
> If PID 1 gets the SIGCHLD for the main process then it assumes the
> service has finished correctly, and will kill the rest that might remain.

Even if we already killed the rest just a few milliseconds ago (in #2)?

>> 7. To make things even worse, after sending the SIGTERMs,
>> service_enter_signal hits:
>>         } else if (state == SERVICE_FINAL_SIGTERM)
>>                 service_enter_signal(s, SERVICE_FINAL_SIGKILL,
>> SERVICE_SUCCESS);
>
> Hmm? if we managed to kill something we'll arm the timeout and wait
> for sigchld or cgroup empty or similar.
>
> These shortcuts only take place if we couldn't kill anything because
> there was nothing. And hence the second killing will have no effect
> either, but at least we go through the state engine...

I added logging to sys_kill at the kernel level, and I definitely
observe "systemctl stop gdm" causing PID 1 to kill gdm-simple-slave 3
times (TERM, TERM, KILL) within the space of a few milliseconds.
I will look closer tomorrow to explain in more detail what is going on
at the code level.

Thanks for your help!
Daniel


More information about the systemd-devel mailing list