[systemd-devel] Activation socket overwritten while socket service running
Trent Lloyd
trent at lloyd.id.au
Tue Apr 25 02:17:14 UTC 2017
Hi Folks,
I ran into an issue when doing local development on Avahi where after
the following sequence of events I could no longer connect to
/run/avahi-daemon/socket activated by avahi-daemon.socket
(1) systemctl disable avahi-daemon.service
(2) systemctl stop avahi-daemon.service
(2) sudo avahi-daemon --debug, then exit the process
(3) systemctl start avahi-daemon.service
[side note: if you try to reproduce this there is a chance a hostname
lookup will trigger and connect to avahi-daemon.socket and activate it,
meaning avahi-daemon won't manually start. You'll have to avoid that;
stopping avahi-daemon.socket would invalidate the issues noted below]
When avahi-daemon is started manually (without systemd), it does not
receive an FD and instead unlinks and replaces the socket.
avahi-daemon.socket is left running and never restarted, the now stale
socket (no longer actually linked to the on-disk file) is still passed
to avahi-daemon.service when it is started again and the socket does not
work. A restart of avahi-daemon.socket does fix that.
I'm wondering if I can improve this situation and what the "Right Way"
is. I think it's a little muddy generally as effectively I am messing
with the system outside of systemd but given the general recommended
socket activation workflow to bind the socket if it's not passed in,
this does not seem entirely unlikely to happen in some cases. And the
result is surprising and confusing on inspection to the average user,
the socket exists but does not work even after restarting
avahi-daemon.service.
I had two general thoughts,
(a) Can I make a unit change to "improve" the situation, for example
adding PartOf=avahi-daemon.service to avahi-daemon.socket. I have
noticed that CUPS and Docker (err, moby) seem to ship this though most
other things don't; however that does seem to take away the ability to
actually use socket activation since they'll always activate together
making it mostly pointless (even though in most cases with avahi
specifically, we want to actually startup and not wait to be activated
so the network side is active). So this seems non-ideal and I think
probably this doesn't make sense. It does make me wonder if others had
the same thought-process and/or problems though.
(b) Would it make sense to improve systemd to monitor the socket status
and alert or "exit" the service (making it eligible to be restarted,
particularly it would be restarted automatically when
avahi-daemon.service is again started) if it is no longer actually bound
to the on disk path. Or otherwise improve the situation directly from
the systemd side in some way, such as checking the socket status at
least when the service is restarting. Restarting avahi-daemon.socket
does in fact restart avahi-daemon.service (if nothing else by way of
necessity I guess, since a new FD has to be passed in) but the reverse
is not true by default (which does make sense for short-lived activated
services, you don't want to re-bind the socket every time and leaves a
race time the service is unavailable)
From my view the "real problem" is that the issue is entirely
invisible. The socket does not work, there are no errors visible on
either the socket or service and restarting avahi-daemon.service does
not fix it. Restarting avahi-daemon.socket does fix it and I appreciate
that, but I think that is confusing in many cases. I am feeling that
having the socket service exit if the path becomes invalid may be a
sensible improvement but I thought I'd float the idea before working on it.
Any input appreciated.
Cheers,
Trent
More information about the systemd-devel
mailing list