[systemd-devel] Early review request: socket activation bridge

Lennart Poettering lennart at poettering.net
Thu Oct 10 19:07:40 PDT 2013


On Thu, 10.10.13 13:12, David Strauss (david at davidstrauss.net) wrote:

> I was actually planning to rewrite on top of libuv today, but I'm
> happy to port to the new, native event library.
> 
> Is there any best-practice for using it with multiple threads?

We are pretty conservative on threads so far, but I guess in this case
it makes some sense to distribute work on CPUs. Here's how I would do it:

You start with one thread first (the main thread that is). You run an
event queue, and add all listening sockets to it. When a connection
comes in you process it as usual. As soon as you notice you are
processing more than let's say 5 connections at the same time, you spawn
a new thread and disable the listening sockets watches (use
sd_event_source_set_enable(fd, SD_EVENT_OFF) for this). That new thread
then also runs an event loop if its own, completely independent of the
original one, and also adds the listening sockets to them, it basically
takes over from the original main thread.

Eventually this second thread will either also reach its limit of 5
connections. Now, we could just fork off a yet another thread, and again
pass control of the listening socket to it and so on, but we cannot do
this unbounded, and we should try to give work back to the older threads
that have become idle again.

To do this, we keep a (mutex protected) global list of thread
information structs, each structure contains two things: a counter how
many connections that thread currently processes, and an fd referring to
a per-thread eventfd(). The eventfd() is hooked into the thread's event
loop, and we use this to pass control of the listening socket from one
thread to another.

So with this in place we can now alter our thread allocation scheme:
instead of stupidly forking off a new thread from a thread that reached
its connection limit we simply sweep through the thread info struct
array and look for the thread with the least number of connections, then
trigger its eventfd. When that thread gets this in its event loop it
will reenable the listening on the fds, and go on, until it reached
again the limit, at which point it will try to find another thread to
take control of the listening socket. When during the sweep a thread
recognizes that all threads are at their limits it forks off a new one,
as described above. If the max number of threads is reached (which we
should put at 2x or 3x the number of CPUs in the the CPU affinity set of
the process), the thread in control of the listening socket will simply
turn off the poll flags for the listening socket, and stop porcessing it
for one event loop iteration, and then try to pass it on to somebody
else on the next iteration.

With this scheme you should get pretty good distribution of things if a
large number of long running TCP connections are made. It will be not as
good if a lot of short ones are made.

That all said, I am not convinced this is really something to
necessarily implement in the service itself. Instead we could also beef
up support for the new SO_REUSEPORT socket option in systemd. For
example, we could add a new option in .socket files:
Distribute=$NUMBER. If set to some number systemd will create that many
socket fds and all bind them to the same configured address with
SO_REUSEPORT. Then, when a connection comes in on any of these, we'd
instantiate a new service instance for each and pass that one listening
socket to it, which that daemon instance would then process. The daemon
would invoke accept() on the fd, a couple of times, and process
everything it finds there. After it became idle for a while it would
exit.

With the SO_REUSEPORT scheme your daemon can stay single threaded
(making things much simpler), and you'd get much better performance
too... (Oh, and of course, with that work, we'd have something powerful
for other usecases too). All load balancing would be done by the kernel,
and that's kinda cool, because they actually are good at these things...

So, if you ask me, I vote for the SO_REUSEPORT logic.

For more information on SO_REUSEPORT:

https://lwn.net/Articles/542629/

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list