[Pixman] [PATCH 0/6] Better performance for fetch/store r5g6b5 in C code

Siarhei Siamashka siarhei.siamashka at gmail.com
Mon Dec 3 14:55:44 PST 2012


This patch series tries to improve the performance of C code when
working with 16bpp color depth by implementing optimized iterators
for fetching and writing back r5g6b5 pixel data. It may be useful
for some less common CPU architectures, which do not have CPU specific
optimizations yet. Also it may be useful for evaluating the quality
of the existing and future CPU specific optimizations (the bar is
set higher when faster C implementation is used as a reference).

The cairo-perf-trace benchmark run for the image16 backend on
Intel Core i7 processor with PIXMAN_DISABLE environment variable set
to "mmx sse2":

Speedups
========
image16  firefox-asteroids  (7197.13 0.06%) ->  (6811.95 0.02%) : 1.06x speedup
image16      midori-zoomed  (3397.77 0.67%) ->  (3226.48 0.73%) : 1.05x speedup
image16  firefox-talos-svg (37677.55 0.05%) -> (36367.96 0.04%) : 1.04x speedup


Profiling logs for the run over all the benchmark traces show that the
overall improvement is rather small and the time spent in the r5g6b5
iterators changes from

    0.88%   6094  libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5
    0.77%   5294  libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5
to
    0.59%   4018  libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5
    0.52%   3550  libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5


Everything is just dominated by the bilinear scaling, which is likely
a bit overrepresented in the cairo benchmark traces. Complete profiling
logs for the sake of completeness:

=== before ===

27.16% 187079  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
14.55% 100256  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
11.90%  81931  libpixman-1.so.0.29.1 [.] combine_over_u
 6.31%  43415  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5
 3.76%  25838  libpixman-1.so.0.29.1 [.] radial_compute_color
 3.14%  21605  libpixman-1.so.0.29.1 [.] fetch_scanline_a8
 3.01%  20710  libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565
 2.88%  19829  libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
 2.53%  17370  libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
 2.30%  15806  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
 1.98%  13604  libpixman-1.so.0.29.1 [.] fast_path_fill
 1.68%  11559  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888
 1.52%  10496  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca
 1.28%   8846  libpixman-1.so.0.29.1 [.] combine_in_reverse_u
 0.91%   6270  libpixman-1.so.0.29.1 [.] bits_image_fetch_general
 0.90%   6203  libc-2.15.so          [.] __memcpy_ssse3_back
 0.88%   6094  libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5
 0.77%   5294  libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5
 0.51%   3483  libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow
 0.46%   3163  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565
 0.42%   2924  libcairo.so.2.11200.0 [.] cell_list_render_edge
 0.42%   2889  libpixman-1.so.0.29.1 [.] pixman_transform_point

=== after ===

27.31% 186552  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
14.66% 100225  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
11.69%  79916  libpixman-1.so.0.29.1 [.] combine_over_u
 6.35%  43366  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5
 3.78%  25829  libpixman-1.so.0.29.1 [.] radial_compute_color
 3.20%  21848  libpixman-1.so.0.29.1 [.] fetch_scanline_a8
 3.04%  20767  libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565
 2.90%  19812  libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
 2.56%  17489  libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
 2.30%  15701  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
 1.98%  13499  libpixman-1.so.0.29.1 [.] fast_path_fill
 1.69%  11537  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888
 1.62%  11089  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca
 1.31%   8909  libpixman-1.so.0.29.1 [.] combine_in_reverse_u
 0.92%   6303  libpixman-1.so.0.29.1 [.] bits_image_fetch_general
 0.91%   6216  libc-2.15.so          [.] __memcpy_ssse3_back
 0.59%   4018  libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5
 0.52%   3550  libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5
 0.51%   3464  libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow
 0.48%   3266  libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565
 0.42%   2898  libpixman-1.so.0.29.1 [.] pixman_transform_point


The same patches are also available here:
    http://cgit.freedesktop.org/~siamashka/pixman-g2d/log/?h=iterators-r5g6b5


Siarhei Siamashka (6):
  test: add "src_0565_8888" to lowlevel-blt-bench
  Change CONVERT_XXXX_TO_YYYY macros into inline functions
  Faster conversion from a8r8g8b8 to r5g6b5 in C code
  Added C variants of r5g6b5 fetch/write-back iterators
  Faster write-back for the C variant of r5g6b5 dest iterator
  Faster fetch for the C variant of r5g6b5 src/dest iterator

 pixman/pixman-bits-image.c |    2 +-
 pixman/pixman-fast-path.c  |  258 +++++++++++++++++++++++++++++++++++---------
 pixman/pixman-inlines.h    |   30 +++---
 pixman/pixman-mmx.c        |   20 ++--
 pixman/pixman-private.h    |   53 +++++++--
 pixman/pixman-sse2.c       |    8 +-
 pixman/pixman.c            |    2 +-
 test/lowlevel-blt-bench.c  |    1 +
 8 files changed, 279 insertions(+), 95 deletions(-)

-- 
1.7.8.6



More information about the Pixman mailing list