[systemd-devel] Compatibility between D-Bus and kdbus

Tue Nov 25 12:01:41 PST 2014

On Tuesday 25 November 2014 17:11:36 Lennart Poettering wrote:
[snip]

Thanks for raising the resource limits.

> > == DBUS_xxxx_BUS_ADDRESS ==
> > 
> > We probably discussed this. Should we specify that the address on the
> > 
> > environment variable should be of the form:
> >  kdbus:path=/sys/fs/kdbus/xxxx,uuid=<uuid from hello>[;fall back
> >  addresses]
> 
> Well, we don't need any env var really, as we enforce that the UID of
> the user is included in the name of their bussess, and the busses are
> cleaned up when the registrar dies. We don't have the risk of leaving
> old busses around, or even by other users, hence all code can just
> imply the path to use is kernel:path=/sys/fs/kdbus/0-system and
> kernel:path=/sys/fs/kdbus/$UID-user and all is good, without ever
> having to deal with env vars at all.
> 
> (of course, if env-vars are set they should be used, but the normal
> codepaths in the distros should work without them.)

Thinking of non-system buses here.

If the variable is empty, I agree that it should have an equivalent of an 
"autostart" mechanism, but I disagree on the solution and I also disagree that 
distros should leave it empty.

For one thing, the fallback address is expected to be there if there's a proxy 
bus running. The current autostart mechanism relies on X being present, so the 
fallback won't be found unless X is running and something registered the 
proxy's socket address there.

For another, it's good practice to have it set and not depend on autostart.

For a third, hardcoding kernel paths in userspace sounds like a poor idea. The 
kdbus mountpoint may be elsewhere and whatever is creating buses may not do it 
per user, but per session or other creation rule it may have.

So we should make sure code works when the env vars are missing, but we should 
recommend that they always be set.

> > == org.freedesktop.DBus connection ==
> > 
> > Will systemd-kdbus provide that name on the bus so applications that make
> > calls directly be able to continue working? I imagine the following
> > methods
> 
> > would be interesting to have:
> No, this is not supported in the current versions of kdbus
> anymore. Emulation of these calls must happen client side if it shall
> be supported.

That wouldn't be kdbus, but systemd doing it. Since systemd is the one that 
opens the bus, it can register the first connection and claim the 
org.freedesktop.DBus service name, providing compatibility. So this isn't a 
feature request for kdbus but a feature request for systemd.

By the way, is there a way to ensure that a given connection is the first 
connection? As soon as the bus creator is able to connect to the /sys/fs/kdbus 
path, so is another process and therefore this other process could maliciously 
acquire names it shouldn't.

> > org.freedesktop.DBus.ReloadConfig
> > org.freedesktop.DBus.StartServiceByName
> > org.freedesktop.DBus.UpdateActivationEnvironment
> > 
> > Most of those would be just convenience for other, existing kdbus
> > low-level
> > calls, but ReloadConfig and UpdateActivationEnvironment are not available
> > anywhere else. It's true that there's nothing stopping more CAP_IPC_OWNER
> > connections from installing more activators, but the question is whether
> > systemd will provide those for the activations it holds.
> 
> The client side emulation can choose to either forward ReloadConifg
> and UpdateActivationEnvironment to the respect systemd calls, or just
> return som "not supported" error.

Can't do that. What if it's a kdbus system that is not systemd?

I don't mind forwarding to a well-known bus name, as long as we establish that 
there is such a service running on the bus that will accept those calls. But 
if such a service exists, why can't it claim the org.freedesktop.DBus name?

> > == Kernel API ==
> > === Custom endpoints ===
> > 
> > The docs say "To create a custom endpoint, use the KDBUS_CMD_ENDPOINT_MAKE
> > ioctl". On what file descriptor? The one for the control file? Or can it
> > be sent on any kdbus endpoint? I'm asking because I'm not sure what the
> > permissions of the control file will be -- will any process be allowed to
> > open it and create endpoints?
> 
> if you want to create a new endpoint for an existing bus, then invoke
> that ioctl on the bus fd. The control file after all is unrelated to
> any bus, and thus wouldn#t know which bus you mean if we'd allow
> invoking that ioctl on it.

Ok, so any application that connected to the "bus" bus can then create custom 
endpoints. Correct?

How does one get to install policies or activators on this custom bus if the 
opening connection is a regular, non-privileged process?

> > But if that's the case, how would one implement a peer-to-peer connection?
> > Or should it simply be a convention that P2P connections are really
> > regular buses, except that no one owns any names, there are no policy
> > restrictions and that the only two connections are :1.1 and :1.2?
> 
> kdbus is not for peer-to-peer connections. If you want that use
> AF_UNIX.

Why?

> There's really no need for peer-to-peer connections really, at least
> performance-wise.

The need is that we can avoid loading the code that does AF_UNIX transport if 
we detect a kdbus-capable bus. It would be nice to use kdbus for P2P too.

Do you see any reason why we couldn't (ab)use the custom endpoints for P2P? 
Are the unique connection IDs shared among all custom endpoints of the bus or 
are they reset to 1?

Also, is there any way to ask an endpoint to stop accepting new connections 
without tearing down the existing ones?

> > === The "1." in unique connection names ===
> > 
> > It's not really necessary. Just because dbus-daemon does it does not mean
> > that kdbus needs to. It's not necessary to satisfy the rule that all
> > connection names contain at least one dot since unique connection names
> > do not pass the validation anyway (the ":" character is not allowed).
> > 
> > Of course this is a simple convention, but why perpetuate the 1?
> 
> well, we should generate names following the same naming scheme as
> dbus1 does. Otherwise implementing the compat proxy is nasty.
>
> I really don't see much of a proble with the weird ":1." prefix. It's
> just a prefix, that is all...

The naming scheme is ":<numbers and dots>". The "1." is implementation-specific 
and not really required.

But it's possible some implementations depend on it, so we may as well keep it 
forever.

> > if that is so, how does the activator read past the activation message to
> > get to the next one, without dropping it?
> 
> Why would it want that? The idea is that the activator actually
> *never* really processes any message. It just waits for POLLIN, then
> activates, stops listening for POLLIN, and activates the daemon which
> then processes the messages.

Because I thought that the activator may be one process for all possible 
services. I'm guessing this is not the way you'd envisioned it. Otherwise, if 
you have 200 activatable services, there are 200 connections by one or more 
process. There's no bus daemon to run out of fd's here, but they would count 
towards the user's system-wide file descriptor limit.

> Well, the reason that this is not documented in kdbus.txt is really
> that the kernel doesn't care about the bloom filter much. All it does
> is ultimately make an AND check of the bitfields, and that's about
> it. How the filter is calculated and what is included in it is
> completely up to userspace. Or in other words: the bloom filter
> calculation should be documented in the dbus spec, not in the kdbus
> docs in the kernel.
> 
> Unfortunately that part of the dbus spec is not written yet.

Fair enough, I'll eargerly await more info. But I'm more reassured that it's 
being worked on -- I'd known this was taken into account, but I couldn't see 
where.

> > === KDBUS_CMD_BYEBYE ===
> > 
> > The docs say that it only succeeds if there are no more messages, at which
> > point no further messages will be accepted. There doesn't seem to be a way
> > of doing a shutdown()-equivalent: stop reception of new messages but
> > still process the queued ones.
> 
> What's the precise usecase for this?

"I've been requested to exit, so I am going to exit now" This tells the kernel 
to stop sending me messages, so I am able to exit. If there are more after 
this, they'll be queued for the activator again, if there's one, rejected 
otherwise.

KDBUS_CMD_BYEBYE seems to be something an on-demand service would use after 
every message it receives. If the call succeeds, it exits; if it fails, it 
parses more. But that doesn't take into account the request to exit coming 
from the user, since it could never do that if more messages kept getting 
received.

> > === kdbus_msg timeout ===
> > 
> > The docs say that the timeout is expressed as a timestamp of the deadline,
> > as opposed to an actual timeout. I would much prefer it be provided in
> > number of nanoseconds to wait, since that's the normal use-case (25
> > seconds). To do it the way that it's proposed would require a call to
> > clock_gettime() and some math before every KDBUS_CMD_MSG_SEND in order to
> > calculate the deadline.
> > 
> > Can this be changed?
> 
> No. This was actually recently changed only. The reason here is about
> restartable syscalls: when the blocking method call ioctl() is used,
> and a signal is received, clients must be able to restart the ioctl()
> where it left of, and for that relative timestamps are really awful,
> as the client side could not just invoke the syscall with the same
> args again.

A solution for that is to update the timeout with the remaining time like 
select(2) does (though glibc hides that). This is a Linux-specific syscall, so 
there's no POSIX compatibility to take into account.

> It's actually a strict rule now for userspace interfaces of the
> kernel: all timeouts should be absolute, not relative, and we need to
> follow here, too.

Understood, but coming from userspace's perspective this seems ill-advised and 
optimising for the wrong use-case. The common case is that syscalls are not 
interrupted: with a timeout, there's one clock_gettime() call *after* the 
interruption if any; with the timestamp, there's always one before the 
syscall.

> > If this isn't to be changed, please change it at least to be a struct
> > timespec, so it's easier to calculate it from the output of
> > clock_gettime().
> 
> Conversion is trivial actually...

timespec→nanosecs is easy, granted (just multiplication and addition). Doing 
the reverse would require divisions and modulus, but I can't think of where it 
would be necessary.

> > PS: the documentation says that it's on CLOCK_MONOTONIC, but glibc does
> > not
> > define _POSIX_MONOTONIC_CLOCK to be larger than zero. That implies that
> > there are Linux systems where no monotonic clock is present. Either kdbus
> > or glibc needs to be fixed.
> 
> No, the monotonic clock is *not* optional on Linux.

Then glibc should be fixed to have _POSIX_MONOTONIC_CLOCK set to 200809L. That 
saves us a sysconf() call to verify whether it's present or not.

http://osxr.org/glibc/source/sysdeps/unix/sysv/linux/bits/posix_opt.h#0093
http://osxr.org/glibc/source/nptl/sysdeps/unix/sysv/linux/bits/posix_opt.h#0161

if you know someone influential there to make it happen, it would be most 
welcome.

> Well, this has been requested before, but this has problems. For
> starters this would mean that each reciever would have to recieve a
> different message, which we currently try to avoid, everybdoy gets the
> same. Also, it means that the kernel would always have to iterate
> through all rules that are installed, instead of being able to return
> quickly if the first rule that matches is found (why? because there
> might be two rules that match the same message).
> 
> Also not that bloom filters are probabilistic anyway, hence you have
> to match in userspace anyway, in order not to get false positives. But
> if you do that you can just build the matching data structure so that
> you also use it to find the appropriate binding for each
> message. sd-bus does that actuallly pretty neatly by building a
> decision tree that solves both problems with the minimal number of
> checks traversing a decision tree.

QtDBus has a hashing table based on the most common rules for dbus1 and then 
does more precise matching. For kdbus support, I suppose we'll use the hashing 
table on the "definite" information and then build the decision tree on the 
rest of the data.

The user cookie would allow us to bypass one or both steps. I understand it 
would require parsing all rules, but is that so bad?

As for having the same message, I was thinking that the cookies should be 
present for all destinations along with the IDs of those destinations. Except 
that that might balloon the size of the message (100 services listening for a 
signal, each matching 4 rules, with 16 bytes per match rule = 6400 bytes of 
overhead).

See below on the names.

> > === Wildcards ===
> > 
> > Are you sure that * not matching a dot is a good idea? What is the
> > rationale behind it?
> 
> Hmm, what precisely is this about? wildcards about?

Just wondering why the * does not match the dot. I'd assume the more common 
case is to match a full prefix and that includes match dots.

> > === KDBUS_ATTACH_NAMES ===
> > 
> > Documentation for metadata says that userspace must cope with some
> > metadata
> > not being delivered. Can we at least require that KDBUS_ATTACH_NAMES be
> > delivered if requested? If the cookie in the match rule isn't provided in
> > the message reception, having the source's names would help solve the
> > problem of the signal delivery.
> > 
> > The timestamp should also be mandatory.
> 
> Yes, they are mandatory. process credentials might be suppressed
> hwover, for example if they cannot be translated due to namespaces.

Thanks. Could you clarify in the docs?

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358