[systemd-devel] Allow stop jobs to be killed during shutdown

Andrey Borzenkov arvidjaar at gmail.com
Sun Jan 26 07:21:16 PST 2014


В Sun, 26 Jan 2014 17:18:28 +0400
Andrey Borzenkov <arvidjaar at gmail.com> пишет:

> В Sun, 26 Jan 2014 12:09:23 +0400
> Andrey Borzenkov <arvidjaar at gmail.com> пишет:
> 
> > В Fri, 24 Jan 2014 18:46:06 +0100
> > Lennart Poettering <lennart at poettering.net> пишет:
> > 
> > > On Fri, 24.01.14 21:10, Ivan Shapovalov (intelfx100 at gmail.com) wrote:
> > > 
> > > > > > > However, something like that can never be the default, we need to give
> > > > > > > services the chance to shut down cleanly and in the right order
> > > > > > 
> > > > > > then bugs like https://bugzilla.redhat.com/show_bug.cgi?id=1023820
> > > > > 
> > > > > I have so far never encountered this issue, but I fear this is a bug
> > > > > where somebody who can reproduce this needs to sit down and debug a
> > > > > bit...
> > > > > 
> > > > > Lennart
> > > > 
> > > > Any advices on how to do that?
> > > > I have both the issue (reproducible on each shutdown) and will to debug.
> > > 
> > > Well, enable the debug shell, and then from there try to figure out why
> > > things are stuck. i.e. whether it is systemd --user that really never
> > > exits. Or whether it actually exits but PID 1 doesn't notice it. And
> > > then if you figured out which of the two cases, you'd have to figure out
> > > why that is...
> > > 
> > 
> > 
> > I finally managed to reproduce it with user instance running with debug
> > level (before *any* attempt to add debugging, strace, whatever resulted
> > in problem disappearing).
> > 
> > It seems that /bin/kill -RTMIN+24 is being killed itself. I wonder - is
> > it possible that it is the same SIGTERM that is used by PID 1 to stop
> > user at 0service?
> > 
> 
> I'm almost sure it is. cg_kill_recursive is in no way atomic, so it can
> easily hit new process that was spawned since service stop had been
> initiated.
> 
> Unfortunately, setting KillMode=process is not allowed:
> 
> Jan 26 17:12:30 linux-1a7f systemd[1]: user at 0.service has PAM enabled. Kill mode must be set to 'control-group'. Refusing.
> 
> Probably user at .service should be exempt from this rule. It is supposed
> to handle all services started by it itself, it *is* service manager
> after all? 
> 

I rebuilt systemd without this restriction, set KillMode=process for
user at .service and this fixed things here.

So there are two problems associated with user instance. 

1. Using KillMode=control-group is wrong. Each service managed by user
instance has own requirements how it is stopped. Just sending everything
SIGTERM without even trying service ExecStop first is obviously
incorrect.

2. user at .service has single timeout, but it manages unknown in advance
number of services each needing unknown timeout. While having some
capping to total timeout looks sensible, only user itself may estimate
the value. But service user at .system is system-level service which use
cannot configure ...


> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Received SIGTERM from PID 1 (systemd).
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Activating special unit exit.target
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Trying to enqueue job exit.target/start/replace
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job exit.target/start as 3
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job systemd-exit.service/start as 4
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job shutdown.target/start as 5
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Installed new job default.target/stop as 7
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Enqueued job exit.target/start as 3
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Stopping Default.
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: default.target changed active -> dead
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Job default.target/stop finished, result=done
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Stopped target Default.
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Starting Shutdown.
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: shutdown.target changed dead -> active
> > Jan 26 11:53:58 linux-1a7f systemd[1942]: Job shutdown.target/start finished, result=done
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Reached target Shutdown.
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Starting Exit the Session...
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: About to execute: /usr/bin/kill -s 58 $MANAGERPID
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Forked /usr/bin/kill as 1951
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service changed dead -> start
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Set up jobs progress timerfd.
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Collecting default.target
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Received SIGCHLD from PID 1943 ((sd-pam)).
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Got SIGCHLD for process 1943 ((sd-pam))
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1943 died (code=exited, status=0/SUCCESS)
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Received SIGCHLD from PID 1951 ((kill)).
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Got SIGCHLD for process 1951 ((kill))
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1951 died (code=killed, status=15/TERM)
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Child 1951 belongs to systemd-exit.service
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service: main process exited, code=killed, status=15/TERM
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: systemd-exit.service changed start -> dead
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Job systemd-exit.service/start finished, result=done
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Started Exit the Session.
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Closed jobs progress timerfd.
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Starting Exit the Session.
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: exit.target changed dead -> active
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Job exit.target/start finished, result=done
> > Jan 26 11:53:59 linux-1a7f systemd[1942]: Reached target Exit the Session.
> 



More information about the systemd-devel mailing list