[systemd-devel] kdbus vs. pipe based ipc performance

Kay Sievers kay at vrfy.org
Mon Mar 3 20:36:16 PST 2014


On Tue, Mar 4, 2014 at 5:00 AM, Kay Sievers <kay at vrfy.org> wrote:
> On Mon, Mar 3, 2014 at 11:06 PM, Kay Sievers <kay at vrfy.org> wrote:
>> On Mon, Mar 3, 2014 at 10:35 PM, Stefan Westerfeld <stefan at space.twc.de> wrote:
>>> First of all: I'd really like to see kdbus being used as a general purpose IPC
>>> layer; so that developers working on client-/server software will no longer
>>> need to create their own homemade IPC by using primitives like sockets or
>>> similar.
>>>
>>> Now kdbus is advertised as high performance IPC solution, and compared to the
>>> traditional dbus approach, this may well be true. But are the numbers that
>>>
>>> $ test-bus-kernel-benchmark chart
>>>
>>> produces impressive? Or to put it in another way: will developers working on
>>> client-/server software happily accept kdbus, because it performs as good as a
>>> homemade IPC solution would? Or does kdbus add overhead to a degree that some
>>> applications can't accept?
>>>
>>> To answer this, I wrote a program called "ibench" which passes messages between
>>> a client and a server, but instead of using kdbus to do it, it uses traditional
>>> pipes. To simulate main loop integration, it uses poll() in cases where a normal
>>> client or server application would go into the main loop, and wait to be woken
>>> up by filedescriptor activity.
>>>
>>> Now here are the results I obtained using
>>>
>>> - AMD Phenom(tm) 9850 Quad-Core Processor
>>> - running Fedora 20 64-bit with systemd+kdbus from git
>>> - system booted with kdbus and single kernel arguments
>>>
>>> ============================================================================
>>> *** single cpu performance:                                      .
>>>
>>>    SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP
>>>
>>>       1   32580   16390     32580  192007  5.89
>>>       2   40870   16960     40870  191730  4.69
>>>       4   40750   16870     40750  190938  4.69
>>>       8   40930   16950     40930  191234  4.67
>>>      16   40290   17150     40290  192041  4.77
>>>      32   40220   18050     40220  191963  4.77
>>>      64   40280   16930     40280  192183  4.77
>>>     128   40530   17440     40530  191649  4.73
>>>     256   40610   17610     40610  190405  4.69
>>>     512   40770   16690     40770  188671  4.63
>>>    1024   40670   17840     40670  185819  4.57
>>>    2048   40510   17780     40510  181050  4.47
>>>    4096   39610   17330     39610  154303  3.90
>>>    8192   38000   16540     38000  121710  3.20
>>>   16384   35900   15050     35900   80921  2.25
>>>   32768   31300   13020     31300   54062  1.73
>>>   65536   24300    9940     24300   27574  1.13
>>>  131072   16730    6820     16730   14886  0.89
>>>  262144    4420    4080      4420    6888  1.56
>>>  524288    1660    2040      2040    2781  1.36
>>> 1048576     800     950       950    1231  1.30
>>> 2097152     310     490       490     475  0.97
>>> 4194304     150     240       240     227  0.95
>>>
>>> *** dual cpu performance:                                      .
>>>
>>>    SIZE    COPY   MEMFD KDBUS-MAX  IBENCH  SPEEDUP
>>>
>>>       1   31680   14000     31680  104664  3.30
>>>       2   34960   14290     34960  104926  3.00
>>>       4   34930   14050     34930  104659  3.00
>>>       8   24610   13300     24610  104058  4.23
>>>      16   33840   14740     33840  103800  3.07
>>>      32   33880   14400     33880  103917  3.07
>>>      64   34180   14220     34180  103349  3.02
>>>     128   34540   14260     34540  102622  2.97
>>>     256   37820   14240     37820  102076  2.70
>>>     512   37570   14270     37570   99105  2.64
>>>    1024   37570   14780     37570   96010  2.56
>>>    2048   21640   13330     21640   89602  4.14
>>>    4096   23430   13120     23430   73682  3.14
>>>    8192   34350   12300     34350   59827  1.74
>>>   16384   25180   10560     25180   43808  1.74
>>>   32768   20210    9700     20210   21112  1.04
>>>   65536   15440    7820     15440   10771  0.70
>>>  131072   11630    5670     11630    5775  0.50
>>>  262144    4080    3730      4080    3012  0.74
>>>  524288    1830    2040      2040    1421  0.70
>>> 1048576     810     950       950     631  0.66
>>> 2097152     310     490       490     269  0.55
>>> 4194304     150     240       240     133  0.55
>>> ============================================================================
>>>
>>> I ran the tests twice - once using the same cpu for client and server (via cpu
>>> affinity) and once using a different cpu for client and server.
>>>
>>> The SIZE, COPY and MEMFD column are produced by "test-bus-kernel-benchmark
>>> chart", the KDBUS-MAX column is the maximum of the COPY and MEMFD column. So
>>> this is the effective number of roundtrips that kdbus is able to do at that
>>> SIZE. The IBENCH column is the effective number of roundtrips that ibench can
>>> do at that SIZE.
>>>
>>> For many relevant cases, ibench outperforms kdbus (a lot). The SPEEDUP factor
>>> indicates how much faster ibench is than kdbus. For small to medium array
>>> sizes, ibench always wins (sometimes a lot). For instance passing a 4Kb array
>>> from client to server and returning back, ibench is 3.90 times faster if client
>>> and server live on the same cpu, and 3.14 times faster if client and server
>>> live on different cpus.
>>>
>>> I'm bringing this up now because it would be sad if kdbus became part of the
>>> kernel and universally available, but application developers would still build
>>> their own protocols for performance reasons. And some things that may need to
>>> be changed to make kdbus run as fast as ibench may be backward incompatible at
>>> some level so it may be better to do it now than later on.
>>>
>>> The program "ibench" I wrote to provide a performance comparision for the
>>> "test-bus-kernel-benchmark" program can be downloaded at
>>>
>>>   http://space.twc.de/~stefan/download/ibench.c
>>>
>>> As a final note, ibench also supports using a socketpair() for communication
>>> between client and server via #define at top, but pipe() communication was
>>> faster in my test setup.
>>
>> Pipes are not interesting for general purpose D-Bus IPC; with a pipe
>> the memory can "move* from one client to the other, because it is no
>> longer needed in the process that fills the pipe.
>>
>> Pipes are a model out-of-focus for kdbus; using pipes where pipes are
>> the appropriate IPC mechanism is just fine, there is no competition,
>> and being 5 times slower than simple pipes is a very good number for
>> kdbus.
>>
>> Kdbus is a low-level implementation for D-Bus, not much else, it will
>> not try to cover all sorts of specialized IPC use cases.
>
> There is also a benchmark in the kdbus repo:
>   ./test/test-kdbus-benchmark
>
> It is probably better to compare that, as it does not include any of
> the higher-level D-Bus overhead from the userspace library, it
> operates on the raw kernel kdbus interface and is quite a lot faster
> than the test in the systemd repo.

Fixed 8k message sizes in all three tools, with a concurrent CPU setup
produces on an Intel i7 2.90GHz:
  ibench: 55.036 - 128.807 transactions/sec
  test-kdbus-benchmark: 73.356 - 82.654 transactions/sec
  test-bus-kernel-benchmark: 23.290 - 27.580 transactions/sec

The test-kdbus-benchmark runs the full-featured kdbus, including
reliability/integrity checks, header parsing, user accounting,
priority queue handling, message/connection metadata handling.

Perf output is attached for all three tools, which show that
test-bus-kernel-benchmark needs to do a lot of other things not
directly related to the raw memory copy performance, and it should not
be directly compared.

Kay
-------------- next part --------------
  2.05%  test-bus-kernel  libc-2.19.90.so            [.] _int_malloc
  1.84%  test-bus-kernel  libc-2.19.90.so            [.] vfprintf
  1.64%  test-bus-kernel  test-bus-kernel-benchmark  [.] bus_message_parse_fields
  1.51%  test-bus-kernel  libc-2.19.90.so            [.] memset
  1.40%  test-bus-kernel  libc-2.19.90.so            [.] _int_free
  1.17%  test-bus-kernel  [kernel.kallsyms]          [k] copy_user_enhanced_fast_string
  1.12%  test-bus-kernel  libc-2.19.90.so            [.] malloc_consolidate
  1.11%  test-bus-kernel  [kernel.kallsyms]          [k] mutex_lock
  1.05%  test-bus-kernel  test-bus-kernel-benchmark  [.] bus_kernel_make_message
  1.04%  test-bus-kernel  [kernel.kallsyms]          [k] kfree
  0.94%  test-bus-kernel  libc-2.19.90.so            [.] free
  0.90%  test-bus-kernel  libc-2.19.90.so            [.] __GI___strcmp_ssse3
  0.88%  test-bus-kernel  test-bus-kernel-benchmark  [.] message_extend_fields
  0.83%  test-bus-kernel  [kdbus]                    [k] kdbus_handle_ioctl
  0.83%  test-bus-kernel  libc-2.19.90.so            [.] malloc
  0.79%  test-bus-kernel  [kernel.kallsyms]          [k] mutex_unlock
  0.76%  test-bus-kernel  test-bus-kernel-benchmark  [.] BUS_MESSAGE_IS_GVARIANT
  0.73%  test-bus-kernel  libc-2.19.90.so            [.] __libc_calloc
  0.72%  test-bus-kernel  libc-2.19.90.so            [.] memchr
  0.71%  test-bus-kernel  [kdbus]                    [k] kdbus_conn_kmsg_send
  0.67%  test-bus-kernel  test-bus-kernel-benchmark  [.] buffer_peek
  0.65%  test-bus-kernel  [kernel.kallsyms]          [k] update_cfs_shares
  0.58%  test-bus-kernel  [kernel.kallsyms]          [k] system_call_after_swapgs
  0.57%  test-bus-kernel  test-bus-kernel-benchmark  [.] service_name_is_valid
  0.56%  test-bus-kernel  test-bus-kernel-benchmark  [.] build_struct_offsets
  0.55%  test-bus-kernel  [kdbus]                    [k] kdbus_pool_copy


  3.95%  test-kdbus-benc  [kernel.kallsyms]      [k] copy_user_enhanced_fast_string
  2.25%  test-kdbus-benc  [kernel.kallsyms]      [k] clear_page_c_e
  2.14%  test-kdbus-benc  [kernel.kallsyms]      [k] _raw_spin_lock
  1.78%  test-kdbus-benc  [kernel.kallsyms]      [k] kfree
  1.65%  test-kdbus-benc  [kernel.kallsyms]      [k] mutex_lock
  1.55%  test-kdbus-benc  [kernel.kallsyms]      [k] get_page_from_freelist
  1.46%  test-kdbus-benc  [kernel.kallsyms]      [k] page_fault
  1.40%  test-kdbus-benc  [kernel.kallsyms]      [k] mutex_unlock
  1.33%  test-kdbus-benc  [kernel.kallsyms]      [k] memset
  1.27%  test-kdbus-benc  [kernel.kallsyms]      [k] shmem_getpage_gfp
  1.16%  test-kdbus-benc  [kernel.kallsyms]      [k] find_get_page
  1.05%  test-kdbus-benc  [kernel.kallsyms]      [k] memcpy
  1.03%  test-kdbus-benc  [kernel.kallsyms]      [k] set_page_dirty
  1.00%  test-kdbus-benc  [kernel.kallsyms]      [k] system_call
  0.94%  test-kdbus-benc  [kernel.kallsyms]      [k] system_call_after_swapgs
  0.93%  test-kdbus-benc  [kernel.kallsyms]      [k] kmem_cache_alloc
  0.93%  test-kdbus-benc  test-kdbus-benchmark   [.] timeval_diff
  0.90%  test-kdbus-benc  [kernel.kallsyms]      [k] page_waitqueue
  0.86%  test-kdbus-benc  libpthread-2.19.90.so  [.] __libc_close
  0.83%  test-kdbus-benc  [kernel.kallsyms]      [k] __call_rcu.constprop.63
  0.81%  test-kdbus-benc  [kdbus]                [k] kdbus_pool_copy
  0.78%  test-kdbus-benc  [kernel.kallsyms]      [k] strlen
  0.77%  test-kdbus-benc  [kernel.kallsyms]      [k] unlock_page
  0.77%  test-kdbus-benc  [kdbus]                [k] kdbus_meta_append
  0.76%  test-kdbus-benc  [kernel.kallsyms]      [k] find_lock_page
  0.71%  test-kdbus-benc  test-kdbus-benchmark   [.] handle_echo_reply
  0.71%  test-kdbus-benc  [kernel.kallsyms]      [k] __kmalloc
  0.67%  test-kdbus-benc  [kernel.kallsyms]      [k] unmap_single_vma
  0.67%  test-kdbus-benc  [kernel.kallsyms]      [k] flush_tlb_mm_range
  0.65%  test-kdbus-benc  [kernel.kallsyms]      [k] __fget_light
  0.63%  test-kdbus-benc  [kernel.kallsyms]      [k] fput
  0.63%  test-kdbus-benc  [kdbus]                [k] kdbus_handle_ioctl

 16.09%  ibench  [kernel.kallsyms]      [k] copy_user_enhanced_fast_string
  4.76%  ibench  ibench                 [.] main
  2.85%  ibench  [kernel.kallsyms]      [k] pipe_read
  2.81%  ibench  [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
  2.31%  ibench  [kernel.kallsyms]      [k] update_cfs_shares
  2.19%  ibench  [kernel.kallsyms]      [k] native_write_msr_safe
  2.03%  ibench  [kernel.kallsyms]      [k] mutex_unlock
  2.02%  ibench  [kernel.kallsyms]      [k] resched_task
  1.74%  ibench  [kernel.kallsyms]      [k] __schedule
  1.67%  ibench  [kernel.kallsyms]      [k] mutex_lock
  1.61%  ibench  [kernel.kallsyms]      [k] do_sys_poll
  1.57%  ibench  [kernel.kallsyms]      [k] get_page_from_freelist
  1.46%  ibench  [kernel.kallsyms]      [k] __fget_light
  1.34%  ibench  [kernel.kallsyms]      [k] update_rq_clock.part.83
  1.28%  ibench  [kernel.kallsyms]      [k] fsnotify
  1.28%  ibench  [kernel.kallsyms]      [k] enqueue_entity
  1.28%  ibench  [kernel.kallsyms]      [k] update_curr
  1.25%  ibench  [kernel.kallsyms]      [k] system_call
  1.25%  ibench  [kernel.kallsyms]      [k] __list_del_entry
  1.22%  ibench  [kernel.kallsyms]      [k] _raw_spin_lock
  1.22%  ibench  [kernel.kallsyms]      [k] system_call_after_swapgs
  1.21%  ibench  [kernel.kallsyms]      [k] task_waking_fair
  1.18%  ibench  [kernel.kallsyms]      [k] poll_schedule_timeout
  1.17%  ibench  [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
  1.14%  ibench  [kernel.kallsyms]      [k] __alloc_pages_nodemask
  1.06%  ibench  [kernel.kallsyms]      [k] enqueue_task_fair
  0.95%  ibench  [kernel.kallsyms]      [k] pipe_write
  0.88%  ibench  [kernel.kallsyms]      [k] do_sync_read
  0.87%  ibench  [kernel.kallsyms]      [k] dequeue_entity


More information about the systemd-devel mailing list