DBus-Daemon Optimizations

Fri Feb 27 06:22:49 PST 2009

I've been working on porting DBus to QNX 6.4.0 on the Beagle Board (TI
OMAP 3530 based) development platform. Part of this porting effort has
been to evaluate the performance of DBus on this embedded platform. I
realize DBus was originally intended to be used as an application bus,
but many characteristics make it suitable for the embedded space as
well.

Local (unix) sockets on QNX are not particularly efficient. To pass a
1KB buffer on this target platform takes ~324 usec (point-to-point) from
client to server (using raw sockets, not DBus). Alternatively, the same
test using QNX's native message passing mechanism takes approximately
~177 usec from the client to server. So, this is roughly a 324 - 177 =
147 usec advantage for the native mechanism.

So for a typical DBus scenario, I would expect roughly a 147 usec
improvement for each leg of a request, e.g.

          REQ             REQ

Client --------> Daemon -------> Service

      +147 usec         +147 usec   |

                                    |

         REPLY            REPLY     V

Client <------- Daemon <------- Service

        +147 usec        +147 usec

So if I have a DBus method that passes an array of bytes (1KB to be
exact), I would expect the local sockets transport to incur roughly 4 *
147 = 588 usec of overhead beyond what the native services could
provide.

In order to make this comparison, we implemented a native QNX transport
mechanism in addition to the local socket implementation. We then
implemented the following DBus interface:

    <method name="GetInteger">

      <arg type="i" name="numInts" direction="in"/>

      <arg type="ai" name="intData" direction="out"/>

    </method>

It requests the server to provide 'N' integers where each integer is 4
bytes. Ignoring the relatively small amount of time to generate this
vector of integers on the server, several invocations resulted in the
average response time:

local (Unix) sockets: 16212 usec

QNX native transport: 15750 usec

========================

                                462 usec difference

So the 462 usec difference is in roughly the same ballpark as the
expected 588 usec difference. No surprises there.

What is perhaps more surprising, is the overhead incurred by the DBus
daemon, library, and the associated binding. A minimum of ~15 msec (at
best) is required to transfer a simulated 1KB array whereas the actual
transfer overhead may only be on the order of 1.2 msec at worst. Passing
a single integer using this interface with the native transport still
required ~12 msec.

Of course this begs the question, where is the time being spent? I know
there is probably a fair amount of time spent marshalling/unmarshalling
the data to/from arrays. I was wondering if anyone could suggest known
areas that might benefit from optimizations. We did these tests on a
modified 1.2.4 baseline (modified to support an additional QNX specific
transport). I was wondering if the reference DBus Daemon actually
marshals and unmarshals the messages that pass through it. Does the
daemon attempt to validate each message that passes through it rather
than just decoding the message header so it knows where to route the
message? Any insights into easy optimization approaches would be
appreciated.

Thanks . . .

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/dbus/attachments/20090227/12083c54/attachment.html