[systemd-devel] Activation socket overwritten while socket service running

Tue Apr 25 02:17:14 UTC 2017

Hi Folks,

I ran into an issue when doing local development on Avahi where after 
the following sequence of events I could no longer connect to 
/run/avahi-daemon/socket activated by avahi-daemon.socket
  (1) systemctl disable avahi-daemon.service
  (2) systemctl stop avahi-daemon.service
  (2) sudo avahi-daemon --debug, then exit the process
  (3) systemctl start avahi-daemon.service
  [side note: if you try to reproduce this there is a chance a hostname 
lookup will trigger and connect to avahi-daemon.socket and activate it, 
meaning avahi-daemon won't manually start.  You'll have to avoid that; 
stopping avahi-daemon.socket would invalidate the issues noted below]

When avahi-daemon is started manually (without systemd), it does not 
receive an FD and instead unlinks and replaces the socket. 
avahi-daemon.socket is left running and never restarted, the now stale 
socket (no longer actually linked to the on-disk file) is still passed 
to avahi-daemon.service when it is started again and the socket does not 
work.  A restart of avahi-daemon.socket does fix that.

I'm wondering if I can improve this situation and what the "Right Way" 
is.  I think it's a little muddy generally as effectively I am messing 
with the system outside of systemd but given the general recommended 
socket activation workflow to bind the socket if it's not passed in, 
this does not seem entirely unlikely to happen in some cases. And the 
result is surprising and confusing on inspection to the average user, 
the socket exists but does not work even after restarting 
avahi-daemon.service.

I had two general thoughts,

(a) Can I make a unit change to "improve" the situation, for example 
adding PartOf=avahi-daemon.service to avahi-daemon.socket.  I have 
noticed that CUPS and Docker (err, moby) seem to ship this though most 
other things don't; however that does seem to take away the ability to 
actually use socket activation since they'll always activate together 
making it mostly pointless (even though in most cases with avahi 
specifically, we want to actually startup and not wait to be activated 
so the network side is active).  So this seems non-ideal and I think 
probably this doesn't make sense.  It does make me wonder if others had 
the same thought-process and/or problems though.

(b) Would it make sense to improve systemd to monitor the socket status 
and alert or "exit" the service (making it eligible to be restarted, 
particularly it would be restarted automatically when 
avahi-daemon.service is again started) if it is no longer actually bound 
to the on disk path.  Or otherwise improve the situation directly from 
the systemd side in some way, such as checking the socket status at 
least when the service is restarting.  Restarting avahi-daemon.socket 
does in fact restart avahi-daemon.service (if nothing else by way of 
necessity I guess, since a new FD has to be passed in) but the reverse 
is not true by default (which does make sense for short-lived activated 
services, you don't want to re-bind the socket every time and leaves a 
race time the service is unavailable)

 From my view the "real problem" is that the issue is entirely 
invisible.  The socket does not work, there are no errors visible on 
either the socket or service and restarting avahi-daemon.service does 
not fix it.  Restarting avahi-daemon.socket does fix it and I appreciate 
that, but I think that is confusing in many cases.  I am feeling that 
having the socket service exit if the path becomes invalid may be a 
sensible improvement but I thought I'd float the idea before working on it.

Any input appreciated.

Cheers,
Trent