[systemd-devel] How can you differentiate behavior between stop and stop as part of restart?

Wed Oct 16 13:02:58 UTC 2024

On Mo, 14.10.24 17:54, Kelsey Cummings (kelsey.cummings at sonic.com) wrote:

> I've come up to a problem that I haven't been able to figure out how to
> solve.  I have a well behaved daemon process that manages its own children
> and has both the ability to gracefully reload it's config and gracefully
> restart.  A graceful restart would work best by starting a new parent and
> then signalling the old parent which would exit once the children complete
> any in-process requests.

So you have two main processes run in parallel? That's a bit icky...

> Restart as stop && start is close enough for my use case but in order to
> ensure the new parent process starts without waiting for the graceful stop
> to complete I had to set "KillMode=none" and explicitly add the "default"
> "ExecStop=/bin/kill -s QUIT $MAINPID" to the unit file.

SIGQUIT is supposed to exit and generate a coredump as request from
the user. It's really weird to use that for anything else. I wouldn't
call a daemon which uses the signal in a completely different way
"well behaved" tbh.

KillMode=none is generally advised against (and you should see noisy
log messages complaining about left-over processes after restart when
you use it), because it means we cannot reset cgroup accounting
properly, cannot apply clean security settings and so on. You'll
always have a mixture of correctly and incorrectly accounted and
secured processes in the cgroup, and that's really not how things
should be done.

If subprocesses of a service shall survive the main process of a
service, then I'd always recommend running them in a scope unit (or
multiple of them) instead. i.e. use the StartTransientUnit() bus call
to allocate a scope unit for your child processes that shall survive
the parent, and keep them running afterwards. This would mean they are
accounted properly, but have an independent lifecycle.

The other option you have is ExitType=cgroup which will tell systemd
to wait until the last process of the service exited before starting
the new instance.

> This behaves as expected and restarts are fast enough for my needs.
>
> HOWEVER, I would like stop to wait for all of the children to gracefully
> exit should stop be called explicitly and/or as part of system
> shutdown/reboot.
>
> I see some long past discussion about adding supporting for ExecRestart
> which could theoretically solve the problem but I figure I'm stuck thinking
> about it the wrong way.

Sorry, we do not support that, the concept would mean we wouldn't be
able to provide the service with a clean execution environment on
start, and that's really what we want to do. Hence, restarting is
under control of the service manager on purpose, and not replaceable
via unit file definitions.

> What's the systemd canonical way to set get the desired behavior?

If you cannot fix your service to follow a more regular restart logic,
then you could simply expose this kind of "service-defined restart"
via the ExecReload= logic, i.e. just do regular reloads instead of
proper restarts for your service. It appears to me this would much
better fit your model, because the assumption that reloads happen
under service control and that processes survive and the execution
contetx is not reset. It's explicitly supported to change the main PID
of the service (for example via sd_notify("MAINPID=")) during reloads
even, hence this should cover things reasonably nice for you.

Lennart

--
Lennart Poettering, Berlin