[systemd-devel] Compatibility between D-Bus and kdbus

Tue Nov 25 15:46:50 PST 2014

Hi Thiago

On Tue, Nov 25, 2014 at 9:01 PM, Thiago Macieira <thiago at kde.org> wrote:
> On Tuesday 25 November 2014 17:11:36 Lennart Poettering wrote:
>> > == org.freedesktop.DBus connection ==
>> >
>> > Will systemd-kdbus provide that name on the bus so applications that make
>> > calls directly be able to continue working? I imagine the following
>> > methods
>>
>> > would be interesting to have:
>> No, this is not supported in the current versions of kdbus
>> anymore. Emulation of these calls must happen client side if it shall
>> be supported.
>
> That wouldn't be kdbus, but systemd doing it. Since systemd is the one that
> opens the bus, it can register the first connection and claim the
> org.freedesktop.DBus service name, providing compatibility. So this isn't a
> feature request for kdbus but a feature request for systemd.

We had "systemd-bus-driverd", which implemented org.freedesktop.DBus
as normal service. However, this didn't work out as many dbus clients
rely on this services to not be re-ordered in regard to external
requests.

In particular, if gdbus runs AddMatch(), it assumes the match takes
effect immediately. If it sends a method call to another service after
installing the match, and this triggers a signal, gdbus assumes the
AddMatch() call to have succeeded (without waiting for the reply).
However, if org.freedesktop.DBus is not implemented by the bus, but by
an external service, you cannot guarantee that messages targetted at
different receivers don't get re-ordered, and there're no guarantees
which process gets scheduled first.

This is a real bug and we couldn't figure out a way to fix it. Current
DBus applications depend on org.freedesktop.DBus to be handled by the
bus entity _in-order_. Therefore, we dropped systemd-bus-driverd and
all the kdbus ioctls that we added to support this.

I strongly recommend to either drop support for org.freedesktop.DBus
on any kdbus-aware DBus APIs, or fake it in the library. sd-bus
doesn't support it, and IIRC Ryan didn't want to fake it in gdbus
either. Applications are required to use explicit
add_match/remove_match library calls, instead of sending messages to
org.freedesktop.DBus.
Note that for legacy applications, we emulate org.freedesktop.DBus in
our proxy. So this is really just about applications that want to use
kdbus directly.

> By the way, is there a way to ensure that a given connection is the first
> connection? As soon as the bus creator is able to connect to the /sys/fs/kdbus
> path, so is another process and therefore this other process could maliciously
> acquire names it shouldn't.

Acquiring names requires matching policies. If you setup your policies
in a way that two applications can acquire the same name, you're doing
something wrong. Or maybe I don't understand your use-case?

>> > == Kernel API ==
>> > === Custom endpoints ===
>> >
>> > The docs say "To create a custom endpoint, use the KDBUS_CMD_ENDPOINT_MAKE
>> > ioctl". On what file descriptor? The one for the control file? Or can it
>> > be sent on any kdbus endpoint? I'm asking because I'm not sure what the
>> > permissions of the control file will be -- will any process be allowed to
>> > open it and create endpoints?
>>
>> if you want to create a new endpoint for an existing bus, then invoke
>> that ioctl on the bus fd. The control file after all is unrelated to
>> any bus, and thus wouldn#t know which bus you mean if we'd allow
>> invoking that ioctl on it.
>
> Ok, so any application that connected to the "bus" bus can then create custom
> endpoints. Correct?

No, custom endpoints can only be created by privileged users. On the
system-bus, this means root, on the user-bus this means processes of
the user itself.

> How does one get to install policies or activators on this custom bus if the
> opening connection is a regular, non-privileged process?

Policy holders and activators are privileged operations, like creating
custom endpoints. You need to open an endpoint and pass POLICY_HOLDER
or ACTIVATOR in KDBUS_CMD_HELLO to become a policy-holder or
activator. You will not be an ordinary connection, so you will not be
announced on the bus, nor can you send messages.

>> > But if that's the case, how would one implement a peer-to-peer connection?
>> > Or should it simply be a convention that P2P connections are really
>> > regular buses, except that no one owns any names, there are no policy
>> > restrictions and that the only two connections are :1.1 and :1.2?
>>
>> kdbus is not for peer-to-peer connections. If you want that use
>> AF_UNIX.
>
> Why?

kdbus implements bus-based IPC. If you want P2P IPC, use one of the
established transports.

Yes, kdbus has some handy features people like to see on unix-sockets
(like flexible metadata transports), but our current policy is "fix
unix sockets!", and "this can optionally be implemented later on".
There is no plan to support P2P connections in the initial kdbus
draft.

>> There's really no need for peer-to-peer connections really, at least
>> performance-wise.
>
> The need is that we can avoid loading the code that does AF_UNIX transport if
> we detect a kdbus-capable bus. It would be nice to use kdbus for P2P too.
>
> Do you see any reason why we couldn't (ab)use the custom endpoints for P2P?
> Are the unique connection IDs shared among all custom endpoints of the bus or
> are they reset to 1?

Custom endpoints do _not_ create new buses. Really. You could create a
custom bus and use it for just 2 connections, but then you could also
just use socketpair(2). Note that there was some discussion on
"anonymous buses", which would allow to create such buses on the fly.
But again, this will not be part of the initial kdbus draft. If anyone
cares, submit it as patches once kdbus is upstream.

> Also, is there any way to ask an endpoint to stop accepting new connections
> without tearing down the existing ones?

No.

>> > if that is so, how does the activator read past the activation message to
>> > get to the next one, without dropping it?
>>
>> Why would it want that? The idea is that the activator actually
>> *never* really processes any message. It just waits for POLLIN, then
>> activates, stops listening for POLLIN, and activates the daemon which
>> then processes the messages.
>
> Because I thought that the activator may be one process for all possible
> services. I'm guessing this is not the way you'd envisioned it. Otherwise, if
> you have 200 activatable services, there are 200 connections by one or more
> process. There's no bus daemon to run out of fd's here, but they would count
> towards the user's system-wide file descriptor limit.

You need to open 1 fd for each name you want to be an activator for.

>> > === KDBUS_CMD_BYEBYE ===
>> >
>> > The docs say that it only succeeds if there are no more messages, at which
>> > point no further messages will be accepted. There doesn't seem to be a way
>> > of doing a shutdown()-equivalent: stop reception of new messages but
>> > still process the queued ones.
>>
>> What's the precise usecase for this?
>
> "I've been requested to exit, so I am going to exit now" This tells the kernel
> to stop sending me messages, so I am able to exit. If there are more after
> this, they'll be queued for the activator again, if there's one, rejected
> otherwise.
>
> KDBUS_CMD_BYEBYE seems to be something an on-demand service would use after
> every message it receives. If the call succeeds, it exits; if it fails, it
> parses more. But that doesn't take into account the request to exit coming
> from the user, since it could never do that if more messages kept getting
> received.

Use close(2).

>> > === kdbus_msg timeout ===
>> >
>> > The docs say that the timeout is expressed as a timestamp of the deadline,
>> > as opposed to an actual timeout. I would much prefer it be provided in
>> > number of nanoseconds to wait, since that's the normal use-case (25
>> > seconds). To do it the way that it's proposed would require a call to
>> > clock_gettime() and some math before every KDBUS_CMD_MSG_SEND in order to
>> > calculate the deadline.
>> >
>> > Can this be changed?
>>
>> No. This was actually recently changed only. The reason here is about
>> restartable syscalls: when the blocking method call ioctl() is used,
>> and a signal is received, clients must be able to restart the ioctl()
>> where it left of, and for that relative timestamps are really awful,
>> as the client side could not just invoke the syscall with the same
>> args again.
>
> A solution for that is to update the timeout with the remaining time like
> select(2) does (though glibc hides that). This is a Linux-specific syscall, so
> there's no POSIX compatibility to take into account.

No. If you use SA_RESTART (see signal(7)), the kernel restarts your
syscall _before_ returning to user-space. Hence, the syscall will be
entered with the exact same arguments. At first, we intended to drop
support for SA_RESTART (just like any socket ioctls with timeouts lack
support for it), but using an absolute timeout is just fine.

>> It's actually a strict rule now for userspace interfaces of the
>> kernel: all timeouts should be absolute, not relative, and we need to
>> follow here, too.
>
> Understood, but coming from userspace's perspective this seems ill-advised and
> optimising for the wrong use-case. The common case is that syscalls are not
> interrupted: with a timeout, there's one clock_gettime() call *after* the
> interruption if any; with the timestamp, there's always one before the
> syscall.

Use clock_gettime(CLOCK_MONOTONIC_COARSE). This reads the time from
the VDSO, which is a simple memory read. No trap/syscall is involved.
Downside is that you loose precision, but that should really be fine
for this use-case.

Thanks!
David