[ANNOUNCE] D-Bus Broker Project

Mon Sep 4 16:08:41 UTC 2017

Hi Colin

On Wed, Aug 23, 2017 at 7:40 PM, Colin Walters <walters at verbum.org> wrote:
> Have you given any thought to addressing the idle exit issues?
> https://lists.freedesktop.org/archives/dbus/2015-May/016671.html

Yes. Lots of. Lemme try to summarize some of our thoughts.

1) Completeness

Our foremost priority was to never drop messages under normal
operation. That is, unless you violate the spec, or you exceed your
quotas, the daemon will never drop any messages destined to you. This
gets tricky during disconnect, though. Even if your incoming queue is
empty, at the time you call close(2), another message might have been
queued.
Therefore, dbus-broker supports graceful disconnects. Whenever a
client closes its *write*-side (i.e., the client calls
shutdown(SHUT_WR)), the daemon will treat this as graceful disconnect.
It will read all pending messages in the incoming queue from the
client until EOF (this is actually implicit, since SHUT_WR is only
visible via EOF to the daemon), it will then actually perform the
client disconnect (releasing names, sending signals, etc.), but
continues flushing the outgoing messages to the client. Once the
outgoing queue is fully flushed, the daemon closes its *write*-side.
This will cause a full shutdown of the channel.
Note that whenever a client closes its *read*-side, they basically
discard any pending message, so we treat this as lossy disconnect just
like a direct close(2).

With this in place, we know that both sides (client and server) always
get all messages that were queued on either side. There is no way a
message gets lost, unless one side explicitly decided to close its
*read*-side.

Completeness does not fully solve exit-on-idle, though.

2) Reliability

While completeness guarantees every message is transmitted, it says
nothing about whether the calls can actually be served. That is, if a
method-call is transmitted right during a graceful-disconnect, there
is no way to answer that method-call. Furthermore, the reply-window
created in the policy is actually closed on disconnect, so no way to
answer on reconnect, either. This could be changed, though. We could
assign those reply-windows to a name and make them survive a
reconnect.

Additionally, if a client actually tracks state about a service it
uses, it might not be aware of exit-on-idle and thus treat a
name-owner-change as service-restart, and thus it might invalidate its
state. This is obviously a contract between client and service, and
services should explicitly state how they behave in this scenario. Its
nothing the broker can fix, but clients need to be aware of this.

Lastly, clients must not resolve names of exit-on-idle services. The
problem here is that the broker must be aware of the destination name
of every message. See the reply-window problem above, a broker must be
aware to which name to assign a reply-window if it shall be kept open.
Furthermore, a client must be aware that the unique-id of the service
can change anytime.

3) Ordering

Whenever AcquireName() or ReleaseName() are involved, message ordering
will be screwed! Every activatable name has its own message-queue in
the daemon as long as no-one owns that name. Hence, names are only
ordered regarding their literal destination. If you own multiple names
(and yes, unique-names count here as well!), messages will not be
properly ordered across those names.

You can deal with that by making all your clients always use the
well-known-name as destination, and making sure you own only a single
name. Everything else, in my opinion, is broken, unless you avoid
exit-on-idle.

4) Summary

If you want exit-on-idle to work without API extensions, you must make
your clients aware of this. Furthermore, the simplest solution is to
call ReleaseName() and wait for the reply. If you received any
method-call in between, you better put them on hold and re-acquire the
name. Then serve them all. If you didn't receive any method-call in
between, disconnect.

Maybe there is more to be aware of, I am not sure. I never considered
exit-on-idle a crucial feature of dbus. I would much rather prefer
keeping your dbus-connection open and just handing it over to your
activator, which activates you as soon as you get EPOLLIN.

Thanks
David