[systemd-devel] systemd-notify --ready is not reliable

Lennart Poettering lennart at poettering.net
Sun May 18 09:54:47 PDT 2014


On Thu, 24.04.14 23:51, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:

> 
> On Wed, Apr 23, 2014 at 08:50:34PM +0200, Lennart Poettering wrote:
> > On Wed, 23.04.14 15:15, Eelco Dolstra (eelco.dolstra at logicblox.com) wrote:
> > 
> > > Hi all,
> > > 
> > > I've noticed that the command "systemd-notify --ready" does not work reliably to
> > > signal that a service is ready. It works sometimes, but most of the time you get
> > > a message like:
> > > 
> > >   systemd[1]: Cannot find unit for notify message of PID 3137.
> > > 
> > > in the journal, and the service stays in the "activating" state.
> > > 
> > > The reason is that systemd-notify sends its message asynchronously and exits
> > > immediately. So by the time systemd processes the message, systemd-notify has
> > > probably already exited, and so systemd cannot gets cgroup. (Note that this
> > > affects other systemd-notify messages as well, but for --ready it's particularly
> > > bad because it causes services to "hang" in the "activating" state.)
> > > 
> > > Any suggestions what to do about this? I can see a few solutions:
> > 
> > There is ongoing work to fix the kernel to add SCM_CGROUPS for us to
> > messages. With that in place we have a race-free way to get this data
> > for incoming messages. I have some hopes that this will soonishly enter
> > the kernel, but then again, this story has been cookie for the past 5
> > years to no successs...
> What about simply waiting in the background for 10s? An ugly workaround,
> but should fix the issue until we have something better.

What about this idea instead+:

Instead of sending the datagram with our own PID in the ucred field we
could simply try to override it with our parent's PID. This will not fix
100% of the cases, but I am quite sure it will fix most, since the
parent process is usually the one that stays around and if you want to
send READY=1, then you are likely to stay around for longer, so that the
parent PID should be good enough.

I am pretty sure we should make this change, regardless whether it fixes
all or only a part of the cases, simply because I think it is the right
thing to do, after all we also send MAINPID= of the paren PID, instead
of our own, for a reason...

Does this make sense?

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list