[systemd-devel] Compatibility between D-Bus and kdbus

Thiago Macieira thiago at kde.org
Mon Nov 24 18:40:10 PST 2014


Sorry for the delay in replying. I didn't have time until after KLF...

On Wednesday 01 October 2014 14:33:01 Simon McVittie wrote:
> System bus access-control policy
> ================================
[snip]
> If I remember correctly, the least bad solution anyone could think of
> was to introduce a new pseudo-bus-type alongside DBUS_BUS_SYSTEM (and
> its equivalent in other libraries like GDBus), perhaps called
> "DBUS_BUS_SYSTEM_UNTRUSTED" or something (better names welcome), with
> its own shared connection: connections to that bus type are not assumed
> to filter messages by their payload, and method-call recipients are
> expected to use Polkit or similar, or do their own simplistic
> access-controls like "must be uid 0" by calling GetConnectionUnixUser or
> GetConnectionCredentials on the sender's unique name.

I'm wondering if the same solution should be applied to the session bus. That 
would have the unfortunate effect that applications that aren't ported to know 
about kdbus will always fallback to proxy functionality. It would be 
unfortunate because the number of applications that need policy decisions on 
the session bust must be asymptotically close to zero.

An alternative solution would be for the "trusted" connection to check if 
there are any files in /etc/dbus-1/session.d. If there aren't, it can assume 
that trusted == untrusted.

PS: we should also find a name that conveys that you *want* the new bus type.

> I'm also very tempted to propose a syntax for an opt-in kdbus-like
> security model (which would take precedence over system.conf/system.d)
> via adding lines to .service files, so that individual services can have
> a sane security model on non-kdbus or non-Linux systems, and systemd's
> systemd-dbus1-generator could use those lines as input. If we get far
> enough with moving system services to that, maybe the transition can be
> easier.

I like this idea, with the proviso that Lennart pointed out: we need to first 
include the metadata in the dbus1 stream messages so the application can sort 
out the policy decisions like the kdbus implementation would.

> Session bus access-control policy
> =================================
> 
> In principle, people could configure the session bus to do the same
> elaborate access-control as the system bus. In practice, this is not a
> particularly useful thing to do, because there are many ways for
> processes running under the same uid to subvert each other, particularly
> if a LSM like SELinux or AppArmor is not used.
> 
> kdbus does not appear to make any attempt to protect a uid from itself:
> the uid that created a bus is considered to be privileged on that bus. I
> assume this means that the intention is that app sandboxing will use a
> separate Unix uid, like it does on Android?
> 
> Unless there's an outcry from people who like LSMs, I'm inclined to say
> that protecting same-uid session processes from each other is doomed to
> failure, and hence that it's OK for DBUS_BUS_SESSION to connect to kdbus
> without special precautions.

I don't understand this domain enough to be able to offer an opinion. I know 
that Tizen will want SMACK security applied even between processes of the same 
UID. I just don't know whether that maps to what Lennart said about "labels".

> Resource limits
> ===============
[snip]
> In kdbus, each uid may create up to 16 buses. In dbus-daemon there is no
> limit. I do not anticipate this being a problem: the reference
> implementation of kdbus' user-space, in systemd, seems to be using a
> per-uid user bus instead of a session bus. Also, even if we continued to
> use a session bus per login, 16 sounds like a reasonable number.

Agreed. Most people I know that start more than one bus for the same user are 
doing it for nested sessions, usually for testing purposes or to keep separate 
implementations of equivalent services (early KDE Frameworks 5 releases 
recommended being separate from KDE 4, not so any more).

> In kdbus, each uid may connect to each bus up to 256 times. I think this
> is actually somewhat likely to be a practical problem: I currently have
> 46 connections to my session bus, so I'm only an order of magnitude away
> from the session breaking.
> <https://code.google.com/p/d-bus/issues/detail?id=9>

Agreed again.

$ qdbus | grep -c \:
72

That's over 25% of the limit. Can this be made runtime-configurable?

> In kdbus, each connection may own up to 64 well-known names; the system
> dbus-daemon defaults to 512, and the session to 50 000. 64 is *probably*
> enough, but I could potentially see this becoming an issue for services
> that allocate one well-known name per $app_specific_thing, like
> Telepathy (one name per Connection).

Also be made configurable, but please raise the default to 256 or 512.

> In kdbus, each connection may have 256 bloom filter entries, which AIUI
> are slightly less expressive than match rules (one match rule maps to
> one or more match rules). The system dbus-daemon defaults to allowing
> 512 match rules, and the session to 50 000. Again, I could potentially
> see this being an issue for existing code: Alban's benchmarking for
> GNOME + Telepathy back in 2011 revealed a peak of 81 match rules,
> although admittedly some of those were due to a dbus-glib bug[2]. QtDBus
> adds more / finer-grained match rules than GDBus or dbus-glib, so it
> might get this worse. I've opened a bug with a possible mitigation:
> <https://code.google.com/p/d-bus/issues/detail?id=10>
> 
> [1] http://people.collabora.com/~alban/d/2011/02/gnome-telepathy.csv
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=33646

This might be a problem. Right now, QtDBus assumes that any match rule it adds 
will be handled successfully. If the resource limit is low enough that an 
application could hit it, we'll need to start handling the failure case. 

I'm not thinking of a misbehaving application that causes runaway uses of 
entries, but a legitimate one that is simply very large or causes a slow leak 
of resources over an extended period of time. That would cause hard-to-debug 
issues because they'd often happen in production.

> Bi-endian systems
> =================
> 
> The reference implementation of kdbus' userland part, in systemd,
> specifically requires that message payloads must be in native endianness
> for simplicity.
> 
> It is not clear to me what this would mean for CPU architectures that
> have runtime-switchable endianness, namely arm, powerpc and possibly
> mips. On these architectures, does Linux effectively impose the
> additional restriction that every process must run in the same
> endianness as the kernel, or what?
> 
> Similarly, a big-endian per-process emulator like qemu-m68k on a
> little-endian system like x86-64 would have to either not implement the
> kdbus ioctls (resulting in emulated processes falling back to
> stream-based D-Bus), or byteswap the message. It presumably already has
> to know how to byteswap struct contents.

I agree with other replies here to punt this problem to the people who are 
really affected by it. Since the kernel does not care about content, there 
should be no trouble with compatibility. Only the userspace bits would need 
updating.

> Message ordering
> ================
> 
> The D-Bus Specification still doesn't define message ordering, which is
> a significant omission. However, dbus-daemon has always imposed a "total
> order" on messages: if two connections A and B both observe messages M1
> and M2, and A observes M1 before M2, then B cannot observe M2 before M1,
> unless it uses a library API that "jumps the queue" by using
> pseudo-blocking[3].

I'm not sure we need to keep the total ordering like you described. We can 
probably relax it a bit to simply ensure causality. In the above case, you 
didn't say what sent those two messages.

If M1 and M2 are sent from the same connection, then all receivers should 
receive M1 and M2 in order.

If M2 is sent by a connection in response to receiving M1 in the first place, 
then all receivers should also observe M1 and M2 in the same order (M1 first).

Other than that, if M1 and M2 are sent by two independent connections without 
causality connection, then the order in which receivers observe it is 
arbitrary.

Now, the question is how to implement that, especially given the causality 
requirement. The simplest way is to have a single queue, like the dbus-daemon 
currently does. I don't know the internals of kdbus to know whether it 
provides this guarantee, but it'll have to.

A couple of other items to discuss:

== DBUS_xxxx_BUS_ADDRESS ==

We probably discussed this. Should we specify that the address on the 
environment variable should be of the form:

 kdbus:path=/sys/fs/kdbus/xxxx,uuid=<uuid from hello>[;fall back addresses]

== org.freedesktop.DBus connection ==

Will systemd-kdbus provide that name on the bus so applications that make 
calls directly be able to continue working? I imagine the following methods 
would be interesting to have:

org.freedesktop.DBus.GetAdtAuditSessionData
org.freedesktop.DBus.GetConnectionCredentials
org.freedesktop.DBus.GetConnectionSELinuxSecurityContext
org.freedesktop.DBus.GetConnectionUnixProcessID
org.freedesktop.DBus.GetConnectionUnixUser
org.freedesktop.DBus.GetId
org.freedesktop.DBus.GetNameOwner
org.freedesktop.DBus.ListActivatableNames
org.freedesktop.DBus.ListNames
org.freedesktop.DBus.ListQueuedOwners
org.freedesktop.DBus.NameHasOwner
org.freedesktop.DBus.ReloadConfig
org.freedesktop.DBus.StartServiceByName
org.freedesktop.DBus.UpdateActivationEnvironment

Most of those would be just convenience for other, existing kdbus low-level 
calls, but ReloadConfig and UpdateActivationEnvironment are not available 
anywhere else. It's true that there's nothing stopping more CAP_IPC_OWNER 
connections from installing more activators, but the question is whether 
systemd will provide those for the activations it holds.

== Kernel API ==
=== Custom endpoints ===

The docs say "To create a custom endpoint, use the KDBUS_CMD_ENDPOINT_MAKE 
ioctl". On what file descriptor? The one for the control file? Or can it be sent 
on any kdbus endpoint? I'm asking because I'm not sure what the permissions of 
the control file will be -- will any process be allowed to open it and create 
endpoints?

Will custom endpoints have IDs too? The documentation for the UUID is in the 
bus section (5.1), not the endpoint section (5.2). Or is the UUID of the 
endpoints shared with the bus endpoint's ID?

Aside from the policy rules, there's no discussion on what a custom endpoint 
can do. Given that, I assume that custom endpoints are fully capable and, 
therefore, can be used for custom application buses (i.e., multiple 
applications, owning names, etc.). 

But if that's the case, how would one implement a peer-to-peer connection? Or 
should it simply be a convention that P2P connections are really regular 
buses, except that no one owns any names, there are no policy restrictions and 
that the only two connections are :1.1 and :1.2?

=== Unique Connection IDs ===

Are 64-bit counters without reuse enough?

Qt used to use a 32-bit (signed) counter for its timer IDs and we used to 
think it was enough to always increment it and never reuse. We were wrong and 
we had to implement an allocator like the PID allocator in the kernel. Now, 
we're talking 33 bits more than the Qt case here, but is it possible that a 
long-running bus would run out of bits? Think of a bus running for a couple of 
years on a server, like Apache httpd using a bus to communicate with its 
workers.

=== Multicasting ===

Another thought that comes to mind: should we reserve the entire highest bit 
in connection IDs for broadcasts? It would allow for the existence of 
multicast groups in the future.

This could be a neat solution for matching interface names.

=== The "1." in unique connection names ===

It's not really necessary. Just because dbus-daemon does it does not mean that 
kdbus needs to. It's not necessary to satisfy the rule that all connection 
names contain at least one dot since unique connection names do not pass the 
validation anyway (the ":" character is not allowed).

Of course this is a simple convention, but why perpetuate the 1?

=== Activation ===

This is not really well-documented in the kdbus.txt file. Can someone expand on 
it, please?

>From what I can read, service activation is performed by one or more 
connections saying they're an "activator" at their Hello time. Therefore, they 
will receive all messages that are directed at a given well-known name. 
Supposedly, they will peek (without dropping) messages to find out what the 
destination is and then launch a process that implements the service (the 
implementor). When the implementor exits, all queued messages to that service 
are transferred back to the activator.

How does the hand-off to the implementor happen? Is it automatic that as soon 
as a connection requests a name in the bus that was previously held by the 
activator, it will start receiving all queued messages?

if that is so, how does the activator read past the activation message to get 
to the next one, without dropping it?

=== Matching string parts of the message ===

I'm not really clear how a connection declares its interest in matching the 
interface name, object path, member name or even string arguments of the 
message. The kdbus_cmd_match structure only seems to have a way of matching 
the sender's name. The bloom filter is not really explained.

This is really crucial, since the most common scenario of signal matching is 
to match against the interface and member names. Dropping this feature is a 
really bad idea, since it would cause applications to listen to every 
broadcast a given connection sends, causing unnecessary wakeups.

A secondary use-case is to match a given string in an argument, like the 
current NameOwnerChanged signal really requires. I'm guessing this won't be 
possible.

=== KDBUS_CMD_BYEBYE ===

The docs say that it only succeeds if there are no more messages, at which 
point no further messages will be accepted. There doesn't seem to be a way of 
doing a shutdown()-equivalent: stop reception of new messages but still 
process the queued ones.

=== kdbus_msg timeout ===

The docs say that the timeout is expressed as a timestamp of the deadline, as 
opposed to an actual timeout. I would much prefer it be provided in number of 
nanoseconds to wait, since that's the normal use-case (25 seconds). To do it 
the way that it's proposed would require a call to clock_gettime() and some 
math before every KDBUS_CMD_MSG_SEND in order to calculate the deadline.

Can this be changed?

If this isn't to be changed, please change it at least to be a struct 
timespec, so it's easier to calculate it from the output of clock_gettime().

PS: the documentation says that it's on CLOCK_MONOTONIC, but glibc does not 
define _POSIX_MONOTONIC_CLOCK to be larger than zero. That implies that there 
are Linux systems where no monotonic clock is present. Either kdbus or glibc 
needs to be fixed.

=== KDBUS_CMD_MSG_CANCEL ===

Clarification: the docs say that this call cancels a blocking wait. Does it 
mean that this command is to be used from a different thread to cause the 
blocking call to return? If it has other purposes, this needs documenting.

=== KDBUS_CMD_NAME_ACQUIRE and RELEASE ===

These calls could allow multiple names to be requested or released in one go. 
Low-priority future improvement, I guess.

I also don't see anything specifying how a connection is notified that it got a 
name it had queued for. Should it use KDBUS_ITEM_NAME_{ADD,CHANGE} and look 
for its own connection ID?

If one requests a name and allows queueing, is there any way to tell from the 
return value whether the acquisition happened immediately? Not that important, 
I guess, since queueing implies async anyway.

=== kdbus_cmd_match user data ===

One of the problems we have in userspace bindings is to figure out what to do 
with a message that was received, especially broadcasts. It would really, 
really help us if we could specify some user data in the match rule and have 
that be provided by the kernel when the message is received.

A single u64 per match rule would be enough, since we can store a pointer 
there (i.e., the cookie that is already there). The received message should 
contain a list of user data (cookies) that it matched.

This could be a future improvement, since right now we make do without it.

=== kdbus_policy_access ===

After reading through the description, you can be sure that there will be a 
request to add the ability to match a given SMACK label, not just UID or GID. 
Hopefully a patch will be sent alongside it, but please make it possible to 
pass labels not just IDs.

=== Wildcards ===

Are you sure that * not matching a dot is a good idea? What is the rationale 
behind it?

Can I suggest using IMAP wildcard matching instead? * matches anything 
including dots, % matches everything except dots.

=== KDBUS_ATTACH_NAMES ===

Documentation for metadata says that userspace must cope with some metadata 
not being delivered. Can we at least require that KDBUS_ATTACH_NAMES be 
delivered if requested? If the cookie in the match rule isn't provided in the 
message reception, having the source's names would help solve the problem of 
the signal delivery.

The timestamp should also be mandatory.
 
=== KDBUS_PAYLOAD_DBUS ===

The docs say it reads "DBusDBus", but on little-endian systems it actually 
reads "suBDsuBD" :-)

Not relevant, though.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358



More information about the systemd-devel mailing list