[systemd-devel] socket activation systemd holding a reference to an accepted socket

Wed Feb 24 12:29:56 UTC 2016

On Tue, 23.02.16 14:58, Ben Woodard (woodard at redhat.com) wrote:

> On Thu, Feb 18, 2016 at 9:08 AM, Lennart Poettering <lennart at poettering.net>
> wrote:
> 
> > On Wed, 17.02.16 17:44, Ben Woodard (woodard at redhat.com) wrote:
> >
> > > Is it intentional that systemd holds a reference to a socket it has
> > > just accepted even though it just handed the open socket over to a
> > > socket.activated service that it has just started.
> >
> > Yes, we do that currently. While there's currently no strict reason to
> > do this, I think we should continue to do so, as there was always the
> > plan to provide a bus interface so that services can rerequest their
> > activation and fdstore fds at any time. Also, for the listening socket
> > case (i.e. Accept=no case) we do the same and have to, hence it kinda
> > is makes sure we expose the same behaviour here on all kinds of
> > sockets.
> >
> 
> I'm having trouble believing that consistency of behavior is an ideal to
> strive for in this case. Consistency with xinetd's behavior would seem to
> be a better benchmark in this case.

Well, we are implementing our own socket passing protocol anyway, and
support the inetd style one only for completeness.

> And I'm not sure that I understand the value being able to rerequest a
> fdstore of fds. To me this sounds like it would be a very rarely used
> feature. Could this be made an option that could be enabled when you add
> the bus service that allows a service to rerequest their activation and
> fdstore fds?

Apple's launchd does activation that way. They have a "check-in"
protocol, where instead of passing fds through exec() the activated
process ask for them via an IPC call. This has quite some benefits as
it allows the service manager to update the set of sockets
dynamically, and the service can query them at any time to acquire the
newest set.

Another reason btw to keep the fd open in systemd is that we allow a
series of programs to be invoked via multiple ExecStartPre=,
ExecStart=, ExecStartPost=, ExecStop=, ExecStopPost= lines, and we
have to keep things open at least for all but the last one hence, so
that we have something to pass to the last invocation.

> > Did you run into problems with this behaviour?
> >
> 
> Oh yes. It breaks a large number of management tools that we have on to do
> various things on clusters. It is a kind of pseudoconcurrency. Think of
> like this:
> 
> foreach compute-node;
>    rsh node daemonize quick-but-not-instant-task
> 
> With xinetd the demonization would close the accepted socket.  Then the
> foreach loop would nearly instantly move onto the next node. We could zip
> through 8000 nodes in a couple of seconds.
> 
> With systemd holding onto the socket the "rsh" hangs until the
> quick-but-not-instant-task completes. This causes the foreach loop to take
> anywhere from between 45min and several hours. Because it isn't really rsh
> and demonize and I just used that to make it easy to understand what is
> going on, rewriting several of our tools is non-trivial and would end up
> violating all sorts of implicit logical layering within the tools and
> libraries that we use to build them.

Hmm, let me get this right: the activated service closes its fd much
earlier than it calls exit() and you want that this is propagated back
to the client instead of having that delayed until the service
actually calls exit()?

Normally one would solve this by inserting shutdown(fd, SHUT_RDWR) at
the right place, since that *really* terminates the connection,
regardless if anyone else has an fd open still. Is that an option
here?

I figure I'd be open to adding an option that makes sure the
connection socket is closed as soon as the last ExecXYZ= process we
need is forked off.

> Where is this in the source code? I've been planning to send you a patch to
> change the behavior but I have't quite connected the dots from where a job
> within a transaction is inserted on the run queue to where the fork for the
> "ExecStart" actually happens.

it's in src/core/service.

Lennart

-- 
Lennart Poettering, Red Hat