[systemd-devel] Compatibility between D-Bus and kdbus

Wed Oct 1 09:58:04 PDT 2014

Hi Simon,

On 10/01/2014 03:33 PM, Simon McVittie wrote:
> (Cc'd to the systemd mailing list because sd-bus is the reference
> implementation of the user-space side of kdbus, but please join the dbus
> list and follow-up there if you are interested in D-Bus.)
> 
> I've recently been looking at kdbus as a transport for D-Bus messages,
> and how compatible or otherwise it is with traditional stream-based
> D-Bus (and in particular, dbus-daemon, the reference implementation of
> stream-based D-Bus). My intention here is to replace dbus-daemon on
> Linux with something that does not have dbus-daemon's limitations,
> but avoid making existing software non-functional or insecure in the
> process. See <https://bugs.freedesktop.org/show_bug.cgi?id=84188> for my
> attempts to document the current state of kdbus in the D-Bus Specification.

Thanks a lot for doing this! Much appreciated.

> I have not reviewed quality-of-implementation except where I happened to
> notice things as I went past, and I have not yet looked at
> Samsung/Tizen's patches adding a kdbus transport to libdbus and GDBus;
> for now I'm using sd-bus as my only reference for the user-space parts.

Let me just state that we didn't post kernel patches to the D-Bus list
in the past, as we didn't intend to discuss the internals of the kernel
code on primarily user-space mailing lists. That, of course, doesn't
mean we didn't want to discuss the external userspace API. We absolutely
want to make sure that kdbus is capable of serving as transportation
level interface that allows us to implement the same set of features
that we have now, eventually, and it seems like a good time to get into
the details.

While in the past, we were saying that kdbus is not ready yet and people
should not yet look at it closely, we're now at a stage in which we
consider the interface more or less stable. Some internal details might
still change, however.

So, for everyone interested, please have a look at the following git
repository:

  https://code.google.com/p/d-bus/

There's a fairly comprehensive documentation in kdbus.txt on the public
kernel API, which is itself defined in kdbus.h.

If you have any questions that are not answered by the existing
documentation, please feel free to ask. We also accept patches, of course.

Since a long time, it's also possible to boot up machines with systemd
when --enable-kdbus is passed in configure, the module is built and
installed, and the verb 'kdbus' is added to the kernel command line.

> System bus access-control policy
> ================================
> 
> I think this is the biggest point of incompatibility. In dbus-daemon
> there is a needlessly elaborate access-control language; in kdbus, there
> is a much simpler and more realistic access-control language, which
> specifically does not look into message payloads.

Exactly. In contrast to the traditional dbus-daemon, the transport layer
in kdbus has no understanding of what it actually transports in its data
blobs. There's no code in the kernel that is able to marshal or
demarshal the payload, that's entirely left to the users and the
libraries they use.

> This is far easier to reason about than what dbus-daemon does, but it's
> a problem for any system service that assumes that its existing
> per-interface or per-method access-control will be applied, such as
> Avahi denying access to SetHostName. It is not acceptable for such
> services to get instant security flaws as a result of their D-Bus
> implementation being upgraded from a version that does not support kdbus
> to a version that does. Unfortunately, last time we discussed this, we
> didn't have a particularly good solution.

Yes, that needs to be addressed. However, we won't add support to kdbus
for method filtering or anything alike, so the kernel won't be able to
help enforce such policies.

> If I remember correctly, the least bad solution anyone could think of
> was to introduce a new pseudo-bus-type alongside DBUS_BUS_SYSTEM (and
> its equivalent in other libraries like GDBus), perhaps called
> "DBUS_BUS_SYSTEM_UNTRUSTED" or something (better names welcome), with
> its own shared connection: connections to that bus type are not assumed
> to filter messages by their payload, and method-call recipients are
> expected to use Polkit or similar, or do their own simplistic
> access-controls like "must be uid 0" by calling GetConnectionUnixUser or
> GetConnectionCredentials on the sender's unique name.
> 
> On non-kdbus systems, it would just be the same thing as
> DBUS_BUS_SYSTEM; on kdbus systems, DBUS_BUS_SYSTEM would refuse to use
> the kdbus transport, and fall back to stream-based D-Bus compatibility
> mechanisms like systemd-bus-proxyd. That would enable individual
> libraries and applications to opt-in to using this new shared connection
> when they have been audited for safety, with the goal being that
> everything eventually moved to it, and nothing connected to
> DBUS_BUS_SYSTEM any more.

Exactly, we would fall back to legacy translators for such client.
systemd-bus-proxyd is currently learning to understand the dbus-daemon
xml policies, so it can enforce them. However, it's quite easy to
express something contradictory in that language, and I guess I have to
carefully read more of the dbus-daemon sources in order to implement the
same behavior. I might get back to you on this later.

> Session bus access-control policy
> =================================
> 
> In principle, people could configure the session bus to do the same
> elaborate access-control as the system bus. In practice, this is not a
> particularly useful thing to do, because there are many ways for
> processes running under the same uid to subvert each other, particularly
> if a LSM like SELinux or AppArmor is not used.
>
> kdbus does not appear to make any attempt to protect a uid from itself:
> the uid that created a bus is considered to be privileged on that bus. I
> assume this means that the intention is that app sandboxing will use a
> separate Unix uid, like it does on Android?

No, that's not the plan. For custom endpoints, we will enforce the
policy even if the uid of the connection is identical to that of the bus
owner. You're right, the implementation does not currently reflect that.
I'm working on it.

> Unless there's an outcry from people who like LSMs, I'm inclined to say
> that protecting same-uid session processes from each other is doomed to
> failure, and hence that it's OK for DBUS_BUS_SESSION to connect to kdbus
> without special precautions.

Yes, on the bus itself, it is. However, the idea of app sandboxing is
currently based on our concept of custom endpoints. A custom endpoint
can be created in parallel to the default endpoint on a bus, and an own
set of policies has to be passed to white-list permissions that apply to
connections on that endpoint. As described above, on such a custom
endpoint, we have to protect same-uid processes from each other.

The custom endpoint will then be bind-mounted over the original default
endpoint location, so the app won't notice. Code for that already exists
in systemd.

> Resource limits
> ===============
> 
> Some resource limits are lower in kdbus than in dbus-daemon.
> 
> In kdbus, the number of unread messages per recipient is limited to 256,
> with up to 16 per uid; subsequent broadcasts are silently dropped, and
> subsequent unicast messages cause the sender to block.

No, the sender does not block but will return -ENOBUFS. Nothing blocks
in kdbus, except for messages that explicitly wait for a synchronous
reply with KDBUS_MSG_FLAGS_SYNC_REPLY.

> The message header (fixed-length header and header fields) is limited to
> 2 MiB in kdbus, whereas on dbus-daemon it may be up to a configurable
> limit, by default 32 MiB for the system bus or 128 MiB for the session
> bus. Broadcast messages (header + body) have the same 2 MiB limit, but
> unicast message bodies may be any size: kdbus itself does not impose any
> limit. I don't know whether anything sends broadcasts as large as 2 MiB
> (Tracker perhaps?): if you do, please share.

The intention is that larger payloads should be sent using memfds, which
kdbus doesn't have to copy around at all. sd-bus does the magic and
offload larger payloads automatically.

> In kdbus, each uid may create up to 16 buses. In dbus-daemon there is no
> limit. I do not anticipate this being a problem: the reference
> implementation of kdbus' user-space, in systemd, seems to be using a
> per-uid user bus instead of a session bus. Also, even if we continued to
> use a session bus per login, 16 sounds like a reasonable number.

Note that all the limits currently enforced by kdbus are just rough
ideas that can be tweaked. Our most important thing was to have the
proper checks in place in various areas, so we can fine-tune the actual
values.

But it's important to not forget we're allocating kernel memory here, so
we have to be very careful, and also consider the full matrix of
possibilities.

> In kdbus, each uid may connect to each bus up to 256 times. I think this
> is actually somewhat likely to be a practical problem: I currently have
> 46 connections to my session bus, so I'm only an order of magnitude away
> from the session breaking.
> <https://code.google.com/p/d-bus/issues/detail?id=9>

Yes, you're right. We should increase that limit. I'll comment on the
bug. Thanks!

> In kdbus, each connection may own up to 64 well-known names; the system
> dbus-daemon defaults to 512, and the session to 50 000. 64 is *probably*
> enough, but I could potentially see this becoming an issue for services
> that allocate one well-known name per $app_specific_thing, like
> Telepathy (one name per Connection).

Same here, probably.

> In kdbus, each connection may have 256 bloom filter entries, which AIUI
> are slightly less expressive than match rules (one match rule maps to
> one or more match rules). The system dbus-daemon defaults to allowing
> 512 match rules, and the session to 50 000. Again, I could potentially
> see this being an issue for existing code: Alban's benchmarking for
> GNOME + Telepathy back in 2011 revealed a peak of 81 match rules,
> although admittedly some of those were due to a dbus-glib bug[2]. QtDBus
> adds more / finer-grained match rules than GDBus or dbus-glib, so it
> might get this worse. I've opened a bug with a possible mitigation:
> <https://code.google.com/p/d-bus/issues/detail?id=10>

I'll look into that too, thanks!

> kdbus has a hard maximum on the reply timeout, whereas stream-based
> D-Bus has DBUS_TIMEOUT_INFINITE. However, the hard maximum is nearly 585
> years, so I don't see this being a practical problem :-)

:)

> fd-passing
> ==========
> 
> In stream-based D-Bus, any file descriptor may be attached to a message
> whose transport is a Unix domain socket, including another Unix domain
> socket. In kdbus, kdbus file descriptors and Unix domain sockets are
> currently specifically disallowed, to avoid recursion. I am not aware of
> any applications that actually do this: the developers of Tracker
> considered it, but ended up using a pipe() instead.

This is a problem we can much better solve when kdbus is merged.
Eventually, we'd like to support all combinations, of course, but we
need a way to track recursion, and we currently can't do that as
external module.

> In stream-based D-Bus, it is valid to attach a file descriptor to a
> broadcast message. In kdbus, it is not. I am not aware of any
> applications that actually do this.

We don't see any reason for this, and it's a dangerous feature because
file descriptors are a precious resource. If we later figure that we
need to have such a feature, we can still add it under certain
circumstances. We can not, however, remove a feature later if we add it now.

> Bi-endian systems
> =================
> 
> The reference implementation of kdbus' userland part, in systemd,
> specifically requires that message payloads must be in native endianness
> for simplicity.

sd-bus is actually able to cope with different endianesses because it
supports the traditional D-Bus protocol. But there's an explicit check
that disallows anything else than the native endianess for kdbus.

> It is not clear to me what this would mean for CPU architectures that
> have runtime-switchable endianness, namely arm, powerpc and possibly
> mips. On these architectures, does Linux effectively impose the
> additional restriction that every process must run in the same
> endianness as the kernel, or what?

Yes. I don't know how any other kernel interface would work if userspace
and kernel space run in different endianess. If anyone's crazy enough to
fix all those interfaces, we can also teach kdbus to swap fields, but
that should probably be done on the userspace side anyway.

> Similarly, a big-endian per-process emulator like qemu-m68k on a
> little-endian system like x86-64 would have to either not implement the
> kdbus ioctls (resulting in emulated processes falling back to
> stream-based D-Bus), or byteswap the message. It presumably already has
> to know how to byteswap struct contents.

I think so too.

> Message ordering
> ================
> 
> The D-Bus Specification still doesn't define message ordering, which is
> a significant omission. However, dbus-daemon has always imposed a "total
> order" on messages: if two connections A and B both observe messages M1
> and M2, and A observes M1 before M2, then B cannot observe M2 before M1,
> unless it uses a library API that "jumps the queue" by using
> pseudo-blocking[3].
> 
> [3] http://smcv.pseudorandom.co.uk/2008/11/nonblocking/
> 
> kdbus has these departures from total ordering:
> 
> * If A is the addressed recipient of M1 and M2, and B is an
>   eavesdropper, B might see M2 first. This could be addressed,
>   at some performance cost, by making sure to hold a lock while
>   delivering messages if there is at least one eavesdropper.

That's true, it's different, and we want to avoid such a big lock here.
However, we do have a per-domain sequence number that is filled into the
message once it hits the kernel via KDBUS_CMD_MSG_SEND. As it is per
domain, it even guarantees message ordering across different buses such
as the system and the session bus.

It's eventually a matter of userspace implementation to order the
messages if that's wanted.

> * There are ioctl APIs that cause messages to "jump the queue",
>   either based on priority or by making a synchronous method call.

Priority based message retrieval will of course jump the queue order,
that's its whole purpose :) However, every message dequeued that way
will still have its sequence number untouched, so it can be brought into
context by userspace if that's intended.

> * Some operations that are method calls in stream-based D-Bus are
>   synchronous ioctls in kdbus. This can result in apparently
>   paradoxical situations like seeing a name in the equivalent of
>   ListNames before receiving the notification that it has an owner,
>   because the notification is processed asynchronously.
>   (Mitigation: it is fairly common to use "pseudo-blocking"
>   for these calls anyway.)

Right. The ioctl will return on the fd before the notification can be
dequeued from the message pool. That's a design change. But as you say
yourself, that's hardly a problem if userspace waits for notifications
to arrive in its message pool rather than operating on the return value
of the ioctl. So I don't think that's much of a problem.

Again, thanks for looking at the code from this perspective. I'm sure we
can sort out the issues you pointed out.

Thanks,
Daniel