D-Bus across networks?

Fri Nov 16 12:48:28 UTC 2018

On Fri, 16 Nov 2018 at 14:08:03 +1300, Lawrence D'Oliveiro wrote:
> On Thu, 15 Nov 2018 19:33:04 -0500, Felipe Gasper wrote:
> > I know D-Bus *can* work over TCP to facilitate this kind of workflow,
> > but I’ve not read much about D-Bus’s actually being widely used in
> > that way.
> 
> Probably because it opens a whole new can of worms.

Yes. D-Bus was designed for the two use cases described in the spec:

* the *session bus* that lets an ordinary user's programs within a session
  (normally a GUI session) communicate with each other, for example
  Notifications (many implementations), the Secrets service
  (gnome-keyring/KWallet/etc.), Tracker, dconf and its predecessor GConf,
  and other session infrastructure

* the *system bus* that lets an ordinary user's programs communicate with
  system services (NetworkManager, Avahi, CUPS etc.), and lets system
  services communicate with each other

Anything that creates more maintenance effort without directly helping
those is indirectly harming them.

> The fundamental problem, as Bruce Schneier would put it, is that
> security does not compose: you can take two systems which have
> individually been demonstrated to be “secure”, try to connect them
> together, and end up introducing new security holes

Yes, that's one aspect of what I would have said the fundamental problem
is: D-Bus' design assumptions and tradeoffs are just not set up for
generic networked communication.

On the performance side, D-Bus essentially assumes that the sockets it
uses are arbitrarily fast, and CPU/memory bandwidth is the limiting
factor. For example, the message encoding is a reasonably efficient
in-memory representation (C-like languages can read strings and arrays
out of a message without copying or packing/unpacking), at the cost of
not being very byte-count-efficient (there is alignment padding between
payload items, and booleans are 31 bits longer than they need to be).
The reference implementation does a lot of copying on the sending side,
but a more optimized implementation wouldn't necessarily need to do that.

D-Bus messages are also relatively "heavy" in terms of the lengths
of their headers, which are large and variable-length, with a lot of
namespaced strings. This is great for D-Bus' designed role as a flexible
integration layer between many independently-designed services, but a
terrible design if your problem space is small and heavily constrained,
particularly if every byte counts. A fixed-length, fixed-layout header
with enums rather than strings is always going to be more efficient if
your protocol has a single authority in charge of design, as long as it
will never need to grow in unexpected directions or without bound.

These are excellent trade-offs on AF_UNIX (where the underlying socket
can transfer large messages nearly as fast as memcpy()), acceptable on
TCP via localhost (which is what D-Bus on Windows uses, because that
platform doesn't have AF_UNIX), start to show serious limitations on
multi-machine LANs, and become very problematic on wide-area networks
where two relatively fast machines (for example a cloud server and a
smartphone) are connected by a relatively slow link.

The bias towards large, extensible message headers makes it a lot
more efficient to send fewer, larger messages (often a single message
per logical transaction), and as a result message bodies also end up
relatively large and variable-length. For example, in the standard
Properties interface, if 10 properties change at the same time, that
should normally be represented in a single PropertiesChanged message, not
in 10 individual messages. D-Bus API designers are encouraged to behave
similarly in their own interfaces, for example sending a single message
with an array of things instead of a message per thing, or sending the
name and details of a newly created thing in a single message instead
of having one message to announce the thing's creation and a series of
follow-up messages to query its details.

(Conversely, D-Bus is designed for use cases where the efficiency of
message encoding matters at least a bit. If human-debuggability is more
important to you than efficiency, then something like JSON, YAML or XML
might make more sense than the D-Bus message encoding - in fact D-Bus
itself uses XML for the "introspection" feature, which is intended for
debugging, prototyping and similar non-critical purposes.)

On the security side of things, the fact that the design assumptions were
based on the two core use cases (the system bus and the session bus) leads
to things like the core concept of identity being in terms of Unix uids
(or Windows SIDs on Windows): the session bus is owned by a single uid
(or SID), whereas the system bus is a shared resource between multiple
connections that are primarily identified by their uids. That's an
excellent simplifying assumption if you are working within one machine,
but is not a concept that naturally scales out to a network.

There is also no link-layer confidentiality or integrity protection in
any D-Bus implementation that I'm aware of, and very limited support
for authentication, because it was designed for situations where that
would be wasted effort. When we use AF_UNIX sockets, we trust the
kernel we're running on to give us of confidentiality, integrity, and
(in supported platforms: at least Linux, *BSD, Solaris and QNX) secure
authentication that cannot be forged. When using TCP over localhost on
Windows, or AF_UNIX sockets on more deficient Unix platforms, the kernel
still gives us confidentiality and integrity, and we have a relatively
simple authentication mechanism based on proving ownership of files
in the user's home directory (which is not, in general, applicable to
networked communication, but is sufficient for the core use-cases).

Please use D-Bus where it fits, and don't use D-Bus where it doesn't fit.
Encouraging wide adoption is fine up to a point, but I would prefer to
see a project not use D-Bus than to see it using D-Bus where something
else would have been more appropriate.

For networked IPC, interesting alternatives include (in no particular
order) WAMP, ZeroMQ, nanomsg, AMQP, MQTT, gRPC, Cap'n Proto and XMPP.
If you investigate any of those in detail you'll notice that they
have different assumptions and trade-offs, with different levels of
centralization, different trust models, and different levels of efficiency
vs. human-readability.

    smcv