[systemd-devel] systemd's connections to /run/systemd/private ?

Brian Reichert reichert at numachi.com
Tue Jul 2 13:57:44 UTC 2019


At $JOB, on some of our SLES12 boxes, our logs are getting swamped
with messages saying:

  "Too many concurrent connections, refusing"

It's hampering our ability to manage services, e.g.:

  # systemctl status ntpd
  Failed to get properties: Connection reset by peer

Near as I can tell from a quick read of the source of dbus.c, we're
hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).  I
think this is related to the number of connections systemd (pid 1)
has to /run/systemd/private, but I'm guessing here:

  # ss -x | grep /run/systemd/private | wc -l
  4015

But, despite the almost 4k connections, 'ss' shows that there are
no connected peers:

  # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
  0

The symptom here is that depending on system activity, systemd stops
being able to process new requests. systemd allows requests to come
in (e.g. via an invocation of 'systemctl', but if I understand the
source of dbus.c, when there are too many connections to it's
outgoing stream, systemd rejects the efforts, apparently with no
retry.

When we first spin up a new SLES12 host with our custom services,
the number of connections to /run/systemd/private numbers in the
mere hundreds.  As workloads increase, the number of connections
raises to the thousands.  Some hosts are plagued with the 'Too many
concurrent' connections, some are not. Empirically, all I've been
able to see is that the number of systemd's connections to
/run/systemd/private tips over 4k.

Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private
correct?

- I can't demonstrate that there are any consumers of this stream.
- I can't explain why the connection count increases over time.
- The CONNECTION_MAX constant is hard-coded, and it gets increased
  every few months/years, but never seems to be expressed as something
  you can set in a config file.
- I don't know what tunables affect the lifetime/culling of those
  connections.

I have a hypothesis that this may be some resource leak in systemd,
but I've not found a way to test that.

-- 
Brian Reichert				<reichert at numachi.com>
BSD admin/developer at large	


More information about the systemd-devel mailing list