[systemd-devel] kdbus vs. pipe based ipc performance

Stefan Westerfeld stefan at space.twc.de
Mon Mar 3 13:35:45 PST 2014


   Hi!

First of all: I'd really like to see kdbus being used as a general purpose IPC
layer; so that developers working on client-/server software will no longer
need to create their own homemade IPC by using primitives like sockets or
similar.

Now kdbus is advertised as high performance IPC solution, and compared to the
traditional dbus approach, this may well be true. But are the numbers that

$ test-bus-kernel-benchmark chart

produces impressive? Or to put it in another way: will developers working on
client-/server software happily accept kdbus, because it performs as good as a
homemade IPC solution would? Or does kdbus add overhead to a degree that some
applications can't accept?

To answer this, I wrote a program called "ibench" which passes messages between
a client and a server, but instead of using kdbus to do it, it uses traditional
pipes. To simulate main loop integration, it uses poll() in cases where a normal
client or server application would go into the main loop, and wait to be woken
up by filedescriptor activity.

Now here are the results I obtained using

- AMD Phenom(tm) 9850 Quad-Core Processor
- running Fedora 20 64-bit with systemd+kdbus from git
- system booted with kdbus and single kernel arguments

============================================================================
*** single cpu performance:                                      .

   SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP

      1   32580   16390     32580  192007  5.89
      2   40870   16960     40870  191730  4.69
      4   40750   16870     40750  190938  4.69
      8   40930   16950     40930  191234  4.67
     16   40290   17150     40290  192041  4.77
     32   40220   18050     40220  191963  4.77
     64   40280   16930     40280  192183  4.77
    128   40530   17440     40530  191649  4.73
    256   40610   17610     40610  190405  4.69
    512   40770   16690     40770  188671  4.63
   1024   40670   17840     40670  185819  4.57
   2048   40510   17780     40510  181050  4.47
   4096   39610   17330     39610  154303  3.90
   8192   38000   16540     38000  121710  3.20
  16384   35900   15050     35900   80921  2.25
  32768   31300   13020     31300   54062  1.73
  65536   24300    9940     24300   27574  1.13
 131072   16730    6820     16730   14886  0.89
 262144    4420    4080      4420    6888  1.56
 524288    1660    2040      2040    2781  1.36
1048576     800     950       950    1231  1.30
2097152     310     490       490     475  0.97
4194304     150     240       240     227  0.95

*** dual cpu performance:                                      .

   SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP

      1   31680   14000     31680  104664  3.30
      2   34960   14290     34960  104926  3.00
      4   34930   14050     34930  104659  3.00
      8   24610   13300     24610  104058  4.23
     16   33840   14740     33840  103800  3.07
     32   33880   14400     33880  103917  3.07
     64   34180   14220     34180  103349  3.02
    128   34540   14260     34540  102622  2.97
    256   37820   14240     37820  102076  2.70
    512   37570   14270     37570   99105  2.64
   1024   37570   14780     37570   96010  2.56
   2048   21640   13330     21640   89602  4.14
   4096   23430   13120     23430   73682  3.14
   8192   34350   12300     34350   59827  1.74
  16384   25180   10560     25180   43808  1.74
  32768   20210    9700     20210   21112  1.04
  65536   15440    7820     15440   10771  0.70
 131072   11630    5670     11630    5775  0.50
 262144    4080    3730      4080    3012  0.74
 524288    1830    2040      2040    1421  0.70
1048576     810     950       950     631  0.66
2097152     310     490       490     269  0.55
4194304     150     240       240     133  0.55
============================================================================

I ran the tests twice - once using the same cpu for client and server (via cpu
affinity) and once using a different cpu for client and server.

The SIZE, COPY and MEMFD column are produced by "test-bus-kernel-benchmark
chart", the KDBUS-MAX column is the maximum of the COPY and MEMFD column. So
this is the effective number of roundtrips that kdbus is able to do at that
SIZE. The IBENCH column is the effective number of roundtrips that ibench can
do at that SIZE.

For many relevant cases, ibench outperforms kdbus (a lot). The SPEEDUP factor
indicates how much faster ibench is than kdbus. For small to medium array
sizes, ibench always wins (sometimes a lot). For instance passing a 4Kb array
from client to server and returning back, ibench is 3.90 times faster if client
and server live on the same cpu, and 3.14 times faster if client and server
live on different cpus.

I'm bringing this up now because it would be sad if kdbus became part of the
kernel and universally available, but application developers would still build
their own protocols for performance reasons. And some things that may need to
be changed to make kdbus run as fast as ibench may be backward incompatible at
some level so it may be better to do it now than later on.

The program "ibench" I wrote to provide a performance comparision for the
"test-bus-kernel-benchmark" program can be downloaded at

  http://space.twc.de/~stefan/download/ibench.c

As a final note, ibench also supports using a socketpair() for communication
between client and server via #define at top, but pipe() communication was
faster in my test setup.

   Cu... Stefan
-- 
Stefan Westerfeld, http://space.twc.de/~stefan


More information about the systemd-devel mailing list