[systemd-devel] Allow stop jobs to be killed during shutdown

Thu Feb 6 01:35:45 CET 2014

On Wed, Feb 05, 2014 at 12:10:51AM +0100, Lennart Poettering wrote:
> On Wed, 29.01.14 19:29, Andrey Borzenkov (arvidjaar at gmail.com) wrote:
> 
> > > Thanks for tracking this done, this really sounds like you nailed the
> > > problem. Now, how to fix it?
> > > 
> > > Hmm, so, I would claim this is a shortcoming of
> > > KillMode=control-group, which is the default for everything. There has
> > > been an item on the TODO list to maybe introduce a KillMode=mixed
> > > setting, which would send SIGTERM only to the main process, and the
> > > SIGKILL later on to all processes. I am pretty sure that this would
> > > solve the issue at hand quite nicely here, because the systemd user
> > > instance would get a nice chance to clean up its own act, before the
> > > systemd system instance would make tabula rasa...
> > > 
> > 
> > I still favor alternative approach - let systemd wait for main PID
> > to exit after ExecStop instead. This is functionally equivalent to the
> > above with slight advantages
> 
> I am really not convinced that ExecStop= should be allowed to be
> asynchronous. (Which is what you suggest we do, right?) In fact, it's
> already problem enough that we pretend we allow ExecReload= to be
> asynchronous like that... It's a question of allowing bad code
> through... Either people let us shutdown a service, or they do it
> themselves, but allowing a crappy (asynchronous) shutdown routine sounds
> wrong to me...
> 
> At the hackfest in BRU I have now implemented KillMode=mixed, which
> should fixed the issue mostly... Could you test, please?
Another bug, perhaps related, did not have time to confirm it, but the
logic is shared and the proposed KillMode=mixed patch did "fix" it! so:

If "KillUserProcesses=yes" of logind.conf is set and if its conditions are
met for the corresponding user, then
# loginctl terminate-user $user

logs:
Feb 05 23:35:00 fedora-tree-20 systemd[1]: session-1.scope stopping timed out. Killing.
Feb 05 23:35:00 fedora-tree-20 systemd[1]: Stopped Session 1 of user root.
Feb 05 23:35:00 fedora-tree-20 systemd[1]: Unit session-1.scope entered failed state.
...
Feb 05 23:35:03 fedora-tree-20 systemd[1]: Starting User Manager for UID 0...
Feb 05 23:35:03 fedora-tree-20 login[259]: pam_systemd(login:session): Failed to create session: Input/output error
Feb 05 23:35:03 fedora-tree-20 login[259]: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
Feb 05 23:35:03 fedora-tree-20 systemd-logind[21]: Failed to start session scope session-1.scope: Unit session-1.scope already exists. org.freedesktop.systemd1.UnitExists
Feb 05 23:35:03 fedora-tree-20 systemd[260]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
...

terminate-user => user_stop() Will try to terminate the scope and service
of the user, however the scope will timeout and will enter a failed state.

Then try to log in again using the same user, the same session id of the
previous session will be available and will be re-used to identify the
session scope of the current new session 'session-1.scope' which will
make session_start() fail since it got the same scope name of the
previous session which is still in the failed state...

session_start()
  => session_start_scope()
     => manager_start_scope() will fail

The pam_systemd will not register the session, and logind function
results will be wrong...

Anyway it seems that this also got fixed, is this the correct fix!
did not have time to debug, but after a "git pull" I did a quick 
test using bash signal trap, got the correct SIGTERM+SIGHUP but
still we do not wait for session processes...

Lennart please, another thing:

src/core/unit.c:unit_kill_context() in the KILL_CONTROL_GROUP or
KILL_MIXED test:

"sig" can be SIGKILL or during the next call after the first
SIGTERM + SIGHUP , sig for sure will be SIGKILL so we have
cg_kill_recursive() sending a SIGKILL, what if it returns > 0
we'll endup sending another SIGHUP after the SIGKILL...

Not sure, I'll try to test it tomorrow.

Thanks!

-- 
Djalal Harouni
http://opendz.org