[systemd-devel] Regression in v209: SIGKILL sent immediately after SIGTERM

Fri Oct 24 00:12:50 PDT 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 24.10.2014 01:51, Lennart Poettering wrote:
> On Fri, 12.09.14 11:57, Stef Walter (stefw at redhat.com) wrote:
> 
>> This commit breaks cockpit orderly shutdown:
>> 
>>> commit 743970d2ea6d08aa7c7bff8220f6b7702f2b1db7 Author: Lennart
>>> Poettering <lennart at poettering.net> Date:   Fri Feb 7 16:12:09
>>> 2014 +0100
>>> 
>>> core: one step back again, for nspawn we actually can't wait
>>> for cgroups running empty since systemd will get exactly zero 
>>> notifications about it
>> 
>> The children of a cockpit login session all get SIGKILL
>> immediately after SIGTERM (less than a tenth of a second apart).
>> cockpit-agent and cockpit-session takes more than a tenth of a
>> second to shutdown cleanly.
>> 
>> The easiest way to reproduce this here, is a system shutdown.
>> Even the 'reboot' that started the system shutdown (executed via
>> ssh) gets a SIGKILL before it can exit().
>> 
>> Here's some output from a simple systemtap probe which
>> demonstrates this:
>> 
>> https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55374240
>>
>>
>> 
Here you can see how a cockpit unit, its login session scope, unit file,
>> unit properties:
>> 
>> https://github.com/cockpit-project/cockpit/issues/1155#issuecomment-55381385
>>
>>
>> 
This commit was introduced in v209, so (for example) the problem is
>> present in Fedora 21. Reverting the commit resolves the problem.
> 
> Well, this is a whack-a-mole game: there's currently no reliable
> way to get notifications when scopes run empty. In some situations
> we get them in others we don't, hence we better not wait for them.
> 
> I am not entirely sure though why this is a problem for cockpit?

Because our entire user session gets a SIGKILL immediately. Which
obviously could lead to data loss.

> Cockpit opens its own PAM sessions, has its own PAM session client 
> code? What's the current logic for ending such a session? Do you 
> properly invoke the PAM session end hooks? Can you elaborate on
> the way cockpit currently uses PAM?

Here's our call to pam_close_session().

https://github.com/cockpit-project/cockpit/blob/master/src/ws/session.c#L974

For local sessions, we use a process called cockpit-session to "do"
our PAM stack and switch to the right user.

The cockpit-session process starts calls pam_open_session() and then
forks cockpit-bridge, which in turn forks other user processes. When
cockpit-bridge exits or terminates on a signal, cockpit-session
session process calls pam_close_session(). Nothing fancy.

In fact the exact same issue happens when sshd opening/closing the
session and launching cockpit-bridge. So it's unlikely this has
anything to do with our PAM code.

With systemd v209 and later everything in the user session all its
children get SIGKILL immediately after SIGTERM. In fact the two
signals come so fast after each other that they sometimes seem to race
(well at least the logging of the events do ... hard to tell).

Stef
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlRJ++oACgkQe/sRCNknZa+PfQCgrV4/3cktyUqxm+IpKvIdkVuV
V0MAnjtooH1SFXctiqHJm+M7aWPiX5eY
=BUCO
-----END PGP SIGNATURE-----