[systemd-devel] kdbus vs. pipe based ipc performance

Kay Sievers kay at vrfy.org
Mon Mar 3 20:00:45 PST 2014


On Mon, Mar 3, 2014 at 11:06 PM, Kay Sievers <kay at vrfy.org> wrote:
> On Mon, Mar 3, 2014 at 10:35 PM, Stefan Westerfeld <stefan at space.twc.de> wrote:
>> First of all: I'd really like to see kdbus being used as a general purpose IPC
>> layer; so that developers working on client-/server software will no longer
>> need to create their own homemade IPC by using primitives like sockets or
>> similar.
>>
>> Now kdbus is advertised as high performance IPC solution, and compared to the
>> traditional dbus approach, this may well be true. But are the numbers that
>>
>> $ test-bus-kernel-benchmark chart
>>
>> produces impressive? Or to put it in another way: will developers working on
>> client-/server software happily accept kdbus, because it performs as good as a
>> homemade IPC solution would? Or does kdbus add overhead to a degree that some
>> applications can't accept?
>>
>> To answer this, I wrote a program called "ibench" which passes messages between
>> a client and a server, but instead of using kdbus to do it, it uses traditional
>> pipes. To simulate main loop integration, it uses poll() in cases where a normal
>> client or server application would go into the main loop, and wait to be woken
>> up by filedescriptor activity.
>>
>> Now here are the results I obtained using
>>
>> - AMD Phenom(tm) 9850 Quad-Core Processor
>> - running Fedora 20 64-bit with systemd+kdbus from git
>> - system booted with kdbus and single kernel arguments
>>
>> ============================================================================
>> *** single cpu performance:                                      .
>>
>>    SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP
>>
>>       1   32580   16390     32580  192007  5.89
>>       2   40870   16960     40870  191730  4.69
>>       4   40750   16870     40750  190938  4.69
>>       8   40930   16950     40930  191234  4.67
>>      16   40290   17150     40290  192041  4.77
>>      32   40220   18050     40220  191963  4.77
>>      64   40280   16930     40280  192183  4.77
>>     128   40530   17440     40530  191649  4.73
>>     256   40610   17610     40610  190405  4.69
>>     512   40770   16690     40770  188671  4.63
>>    1024   40670   17840     40670  185819  4.57
>>    2048   40510   17780     40510  181050  4.47
>>    4096   39610   17330     39610  154303  3.90
>>    8192   38000   16540     38000  121710  3.20
>>   16384   35900   15050     35900   80921  2.25
>>   32768   31300   13020     31300   54062  1.73
>>   65536   24300    9940     24300   27574  1.13
>>  131072   16730    6820     16730   14886  0.89
>>  262144    4420    4080      4420    6888  1.56
>>  524288    1660    2040      2040    2781  1.36
>> 1048576     800     950       950    1231  1.30
>> 2097152     310     490       490     475  0.97
>> 4194304     150     240       240     227  0.95
>>
>> *** dual cpu performance:                                      .
>>
>>    SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP
>>
>>       1   31680   14000     31680  104664  3.30
>>       2   34960   14290     34960  104926  3.00
>>       4   34930   14050     34930  104659  3.00
>>       8   24610   13300     24610  104058  4.23
>>      16   33840   14740     33840  103800  3.07
>>      32   33880   14400     33880  103917  3.07
>>      64   34180   14220     34180  103349  3.02
>>     128   34540   14260     34540  102622  2.97
>>     256   37820   14240     37820  102076  2.70
>>     512   37570   14270     37570   99105  2.64
>>    1024   37570   14780     37570   96010  2.56
>>    2048   21640   13330     21640   89602  4.14
>>    4096   23430   13120     23430   73682  3.14
>>    8192   34350   12300     34350   59827  1.74
>>   16384   25180   10560     25180   43808  1.74
>>   32768   20210    9700     20210   21112  1.04
>>   65536   15440    7820     15440   10771  0.70
>>  131072   11630    5670     11630    5775  0.50
>>  262144    4080    3730      4080    3012  0.74
>>  524288    1830    2040      2040    1421  0.70
>> 1048576     810     950       950     631  0.66
>> 2097152     310     490       490     269  0.55
>> 4194304     150     240       240     133  0.55
>> ============================================================================
>>
>> I ran the tests twice - once using the same cpu for client and server (via cpu
>> affinity) and once using a different cpu for client and server.
>>
>> The SIZE, COPY and MEMFD column are produced by "test-bus-kernel-benchmark
>> chart", the KDBUS-MAX column is the maximum of the COPY and MEMFD column. So
>> this is the effective number of roundtrips that kdbus is able to do at that
>> SIZE. The IBENCH column is the effective number of roundtrips that ibench can
>> do at that SIZE.
>>
>> For many relevant cases, ibench outperforms kdbus (a lot). The SPEEDUP factor
>> indicates how much faster ibench is than kdbus. For small to medium array
>> sizes, ibench always wins (sometimes a lot). For instance passing a 4Kb array
>> from client to server and returning back, ibench is 3.90 times faster if client
>> and server live on the same cpu, and 3.14 times faster if client and server
>> live on different cpus.
>>
>> I'm bringing this up now because it would be sad if kdbus became part of the
>> kernel and universally available, but application developers would still build
>> their own protocols for performance reasons. And some things that may need to
>> be changed to make kdbus run as fast as ibench may be backward incompatible at
>> some level so it may be better to do it now than later on.
>>
>> The program "ibench" I wrote to provide a performance comparision for the
>> "test-bus-kernel-benchmark" program can be downloaded at
>>
>>   http://space.twc.de/~stefan/download/ibench.c
>>
>> As a final note, ibench also supports using a socketpair() for communication
>> between client and server via #define at top, but pipe() communication was
>> faster in my test setup.
>
> Pipes are not interesting for general purpose D-Bus IPC; with a pipe
> the memory can "move* from one client to the other, because it is no
> longer needed in the process that fills the pipe.
>
> Pipes are a model out-of-focus for kdbus; using pipes where pipes are
> the appropriate IPC mechanism is just fine, there is no competition,
> and being 5 times slower than simple pipes is a very good number for
> kdbus.
>
> Kdbus is a low-level implementation for D-Bus, not much else, it will
> not try to cover all sorts of specialized IPC use cases.

There is also a benchmark in the kdbus repo:
  ./test/test-kdbus-benchmark

It is probably better to compare that, as it does not include any of
the higher-level D-Bus overhead from the userspace library, it
operates on the raw kernel kdbus interface and is quite a lot faster
than the test in the systemd repo.

Kay


More information about the systemd-devel mailing list