[systemd-devel] socket unit refusing connection when JOB_STOP is pending

Lennart Poettering lennart at poettering.net
Mon May 29 15:44:59 UTC 2017


On Tue, 16.05.17 11:28, Moravec, Stanislav (ERT) (stanislav.moravec at hpe.com) wrote:

> Hello all,
> 
> I wanted to seek your opinion about correctness of the current behavior
> of socket activated units.
> 
> Let's assume we have socket activated service (for example authd - auth.socket) and 
> some other background service (for the purpose of this test called authtest.service) 
> that needs to connect to the socket service to properly stop itself.
> 
> The authtest defines dependency on auth.socket as expected:
> 
> # cat /usr/lib/systemd/system/authtest.service
> [Unit]
> Description=Test Script to connect auth during shutdown
> After=auth.socket
> Requires=auth.socket
> 
> [Service]
> ExecStart=/bin/true
> ExecStop=/usr/bin/connect_authd
> Type=oneshot
> RemainAfterExit=yes
> 
> [Install]
> WantedBy=multi-user.target
> 
> Yet, authtest doesn't stop correctly (in our test case, the connection just fails,
> not real failure), because auth.socket refuses connections as soon as pending job 
> on auth.socket is JOB_STOP, even if it's not yet time to really stop the unit. 
> 
> The auth.socket:
> May 16 11:23:41 pra0097 systemd[1]: Installed new job auth.socket/stop as 9395
> May 16 11:23:41 pra0097 systemd[1]: Incoming traffic on auth.socket
> May 16 11:23:41 pra0097 systemd[1]: Suppressing connection request on auth.socket since unit stop is scheduled.
> // NOTE the above
> May 16 11:24:44 pra0097 systemd[1]: auth.socket changed listening -> dead
> May 16 11:24:44 pra0097 systemd[1]: Job auth.socket/stop finished, result=done
> May 16 11:24:44 pra0097 systemd[1]: Closed Authd Activation Socket.
> May 16 11:24:44 pra0097 systemd[1]: Stopping Authd Activation Socket.
> 
> The authtest:
> May 16 11:23:41 pra0097 systemd[1]: Installed new job authtest.service/stop as 9337
> May 16 11:23:41 pra0097 systemd[1]: About to execute: /usr/bin/connect_authd
> May 16 11:23:41 pra0097 systemd[1]: Forked /usr/bin/connect_authd as 7051
> May 16 11:23:41 pra0097 systemd[1]: authtest.service changed exited -> stop
> May 16 11:23:41 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
> May 16 11:23:41 pra0097 systemd[7051]: Executing: /usr/bin/connect_authd
> May 16 11:23:41 pra0097 connect_authd[7051]: Tue May 16 11:23:41 CEST 2017
> May 16 11:23:41 pra0097 connect_authd[7051]: COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> May 16 11:23:41 pra0097 connect_authd[7051]: systemd   1 root   38u  IPv6  19431      0t0  TCP *:auth (LISTEN)
> May 16 11:23:41 pra0097 connect_authd[7051]: ERROR reading from socket: Connection reset by peer
> May 16 11:23:41 pra0097 connect_authd[7051]: sending message: 80,80
> May 16 11:23:41 pra0097 systemd[1]: Child 7051 belongs to authtest.service
> May 16 11:23:41 pra0097 systemd[1]: authtest.service: control process exited, code=exited status=0
> May 16 11:23:41 pra0097 systemd[1]: authtest.service got final SIGCHLD for state stop
> May 16 11:23:41 pra0097 systemd[1]: authtest.service changed stop -> dead
> May 16 11:23:41 pra0097 systemd[1]: Job authtest.service/stop finished, result=done
> May 16 11:23:41 pra0097 systemd[1]: Stopped Test Script to connect auth during shutdown.
> May 16 11:23:41 pra0097 systemd[1]: authtest.service: cgroup is empty
> 
> 
> The relevant piece of code:
> static void socket_enter_running(Socket *s, int cfd) {
> ...
>         /* We don't take connections anymore if we are supposed to shut down anyway */
>         if (unit_stop_pending(UNIT(s))) {
>             log_unit_debug(UNIT(s), "Suppressing connection request since unit stop is scheduled.");
> ...
> 
> 
> bool unit_stop_pending(Unit *u) {
> ...
>         return u->job && u->job->type == JOB_STOP;
> }
> 
> Would not it make sense to still allow connections while the unit is still running? 
> Or maybe for compatibility some boolean could be added to socket unit definition to allow 
> the socket to keep answering connection until it really is stopped.
> 
> If it was not a socket activated unit the 2 services would order and work just fine, 
> so why should socket unit be different?
> 
> Opinions?

This is indeed a shortcoming in systemd's model right now: we don't
permit a start and a stop job to be enqueued for the same unit at the
same time. But to do what you want to do we'd need to permit that: the
service is supposed to stop, but also temporarily start.

I don't really have any nice way out to recommend to you I
fear. Permitting multiple jobs to be enqueued for the same unit would
be a major change in the design of systemd, and would result in a
number of complex problems (i.e. detecting cycles and deadlocks
becomes much more complex).

The best I can offer is to change the design of the services in
question: instead of connecting to the other service only at shutdown,
instead establish the connection when starting up, and leave the
connection around. THis way abnormal exits could be detected as well,
and no activation would be necessary anymore at shutdown.

I hope that helps in any way?

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list