[systemd-devel] systemd-notify --ready is not reliable

Wed Apr 23 06:15:13 PDT 2014

Hi all,

I've noticed that the command "systemd-notify --ready" does not work reliably to
signal that a service is ready. It works sometimes, but most of the time you get
a message like:

  systemd[1]: Cannot find unit for notify message of PID 3137.

in the journal, and the service stays in the "activating" state.

The reason is that systemd-notify sends its message asynchronously and exits
immediately. So by the time systemd processes the message, systemd-notify has
probably already exited, and so systemd cannot gets cgroup. (Note that this
affects other systemd-notify messages as well, but for --ready it's particularly
bad because it causes services to "hang" in the "activating" state.)

Any suggestions what to do about this? I can see a few solutions:

* Have sd_notify() include its own unit name in the notification message. This
would be insecure (though probably fine if the sender is root). However, it
could be made secure by having systemd pass some random cookie to services via
an environment variable, which sd_notify() could then include in its
notification messages to authenticate itself.

* Make systemd-notify synchronous, i.e., have it wait for a message back from
systemd after it has determined the client's unit. Not entirely trivial given
that sd_notify() uses SOCK_DGRAM.

* Give each service its own notification socket, rather than using the global
/run/systemd/notify. That is, in the service, set $NOTIFY_SOCKET to something
like /run/systemd/notify-foo.service, and have systemd listen on that socket. By
making the socket private to the service's mount namespace, you would know for
sure that any message arriving on the socket comes from the service.

* Document that you shouldn't use systemd-notify. Not an ideal solution :-)

What do you think?

-- 
Eelco Dolstra | LogicBlox, Inc. | http://nixos.org/~eelco/