<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"MS Mincho";
        panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
        {font-family:"\@MS Mincho";
        panose-1:2 2 6 9 4 2 5 8 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:Arial;
        color:windowtext;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>I’ve been working on porting DBus to QNX 6.4.0 on the Beagle
Board (TI OMAP 3530 based) development platform. Part of this porting effort
has been to evaluate the performance of DBus on this embedded platform. I realize
DBus was originally intended to be used as an application bus, but many
characteristics make it suitable for the embedded space as well.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Local (unix) sockets on QNX are not particularly efficient. To
pass a 1KB buffer on this target platform takes ~324 usec (point-to-point) from
client to server (using raw sockets, not DBus). Alternatively, the same test
using QNX’s native message passing mechanism takes approximately ~177
usec from the client to server. So, this is roughly a 324 – 177 = 147
usec advantage for the native mechanism.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>So for a typical DBus scenario, I would expect roughly a 147
usec improvement for each leg of a request, e.g.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'>
REQ REQ<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'>Client --------> Daemon -------> Service<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'> +147
usec +147 usec |<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'>
|<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'>
REPLY
REPLY V<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'>Client <------- Daemon <------- Service<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 face="Courier New"><span style='font-size:
10.0pt'> +147
usec +147 usec<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>So if I have a DBus method that passes an array of bytes
(1KB to be exact), I would expect the local sockets transport to incur roughly
4 * 147 = 588 usec of overhead beyond what the native services could provide.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>In order to make this comparison, we implemented a native
QNX transport mechanism in addition to the local socket implementation. We then
implemented the following DBus interface:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal style='text-autospace:none'><font size=2 face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New"'>
<method name="GetInteger"><o:p></o:p></span></font></p>
<p class=MsoNormal style='text-autospace:none'><font size=2 face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New"'>
<arg type="i" name="numInts"
direction="in"/><o:p></o:p></span></font></p>
<p class=MsoNormal style='text-autospace:none'><font size=2 face="Courier New"><span
style='font-size:10.0pt;font-family:"Courier New"'>
<arg type="ai" name="intData"
direction="out"/><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face="Courier New"><span style='font-size:10.0pt;
font-family:"Courier New"'> </method><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>It requests the server to provide ‘N’ integers
where each integer is 4 bytes. Ignoring the relatively small amount of time to
generate this vector of integers on the server, several invocations resulted in
the average response time:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>local (Unix) sockets: 16212 usec<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>QNX native transport: 15750 usec<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>========================<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>
462 usec difference<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>So the 462 usec difference is in roughly the same ballpark
as the expected 588 usec difference. No surprises there.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>What is perhaps more surprising, is the overhead incurred by
the DBus daemon, library, and the associated binding. A minimum of ~15 msec (at
best) is required to transfer a simulated 1KB array whereas the actual transfer
overhead may only be on the order of 1.2 msec at worst. Passing a single
integer using this interface with the native transport still required ~12 msec.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Of course this begs the question, where is the time being
spent? I know there is probably a fair amount of time spent
marshalling/unmarshalling the data to/from arrays. I was wondering if anyone
could suggest known areas that might benefit from optimizations. We did these
tests on a modified 1.2.4 baseline (modified to support an additional QNX
specific transport). I was wondering if the reference DBus Daemon actually marshals
and unmarshals the messages that pass through it. Does the daemon attempt to
validate each message that passes through it rather than just decoding the
message header so it knows where to route the message? Any insights into easy
optimization approaches would be appreciated.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Thanks . . .<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
</div>
</body>
</html>