D-Bus Versus Varlink

Tue Apr 2 14:19:34 UTC 2024

On Do, 21.03.24 12:58, Thomas Kluyver (thomas at kluyver.me.uk) wrote:

> I'm struggling to figure out how much varlink is actually used in
> the wild. Red Hat seem to be involved in it, and things they work on
> usually show up in Fedora pretty quickly, but there are no varlink
> libraries or tools installed by default, and the varlink resolver
> doesn't seem to be running. Podman (another project with RH
> involved) seems to have thought in 2020 [1] that varlink is more or
> less deprecated and ripped it out.

Thre's not much use in the wild I am aware of, with the exception of
recent systemd versions. With upcoming systemd v256 we'll have 14
different varlink services now. Because it's so much easier to add
than D-Bus support it's quite likely we'll add a lot more Varlink APIs
in future, and fewer new D-Bus API.

> On the other hand, systemd seems to be adopting varlink more in the
> last few months - including creating its own 'varlinkctl' CLI [2] -
> but I haven't found any overall explanation of why, or what the
> long-term plan is, just PRs working on varlink support.

There are a variety of reasons. Here are the three of the biggest ones:

1. D-Bus is *pain* in early boot. Until dbus-broker starts up, we can
   only do direct connections, and that's just a mess quite
   frankly. In PID1 we basically reimplement a shitty version of
   dbus-broker so that clients can talk to us during early boot. We do
   this only in PID1 though, because it's such a mess, even though we
   have various other services in early boot which really should have
   an IPC. Note that so many of systemd's components are early boot
   rhough, hence this is a major problem for us.

2. We cannot use D-Bus sanely for functionality that dbus-broker
   itself uses. Specifically, D-Bus wants to use NSS to resolve
   user/group names, it wants to log via syslog(), and wants to get
   lifecycle managed by PID 1. Now, these three subsystems are pretty
   much IPC interfaces, i.e. would be candidates for D-Bus. But we
   cannot use DBus for them since that would otherwise become a cyclic
   dep: a service A would use and block on a service B which would use
   and block on A again. This too could be dealt with by using direct
   connections, i.e. implementing shitty local versions of D-Bus
   broker again, but yuck. That means using D-Bus without really using
   D-Bus.

3. The fact that D-Bus multiplexes many calls onto one connection
   really complicates service development. Consider that systemd has
   numerous tools that work like UNIX tools: you invoke them, they do
   one operation, and exit (e.g. bootctl, systemd-cryptsetup,
   systemd-cryptenroll, …). We'd like to open these up via IPC (to get
   security isolation, stable interfaces and user friendliness towards
   programmers). If we wanted to do this via D-Bus we have to turn
   these little tools into daemons that can process multiple requests
   in parallel. That is a *lot* of work, you have to switch them to
   operate asynchronously, have an event loop and so on. And the end
   result isn't even that efficient, since in C you'd typically not
   distribute your work on multiple threads/CPUs, and given the global
   ordering guarantees D-Bus doesn't really want you to do that
   anyway. Varlink makes this all a *ton* easier: because there is no
   method call multiplexing, each parallel ongoing method call is
   fired over a separate AF_UNIX stream (i.e. in our model connections
   are *cheap*). And that means you can just use systemd's socket
   activation: if 5 operations are coming in, this translates to 5
   forked off instances of the tool, each processing one
   connection. And this makes things very very easy. Each tool just
   reads the method call from its connection, processes it, and writes
   the response back to the connection. Done. This sounds like a minor
   thing, but I think it's actually the *most* important facet. It has
   resulted in something of an explosion of IPC interfaces in systemd:
   a much larger number of tools suddenly gained varlink IPC, because
   it's so much easier to add than D-Bus, which forces you into a
   differently programming model. After all, in a way, you know all
   those tools that in the recent years gained a switch to output JSON
   data? You can think of Varlink being a small extension to that: it
   also allows taking *input* via JSON (rather than the cmdline), and
   boom you can suddenly bind that tool to a socket and turned into
   into an IPC service. That's a much smaller step for such tools than
   making them become D-Bus services.

If that makes any sense?

Lennart

--
Lennart Poettering, Berlin