[Pixman] [PATCH 0/6] Better performance for fetch/store r5g6b5 in C code
Siarhei Siamashka
siarhei.siamashka at gmail.com
Mon Dec 3 14:55:44 PST 2012
This patch series tries to improve the performance of C code when
working with 16bpp color depth by implementing optimized iterators
for fetching and writing back r5g6b5 pixel data. It may be useful
for some less common CPU architectures, which do not have CPU specific
optimizations yet. Also it may be useful for evaluating the quality
of the existing and future CPU specific optimizations (the bar is
set higher when faster C implementation is used as a reference).
The cairo-perf-trace benchmark run for the image16 backend on
Intel Core i7 processor with PIXMAN_DISABLE environment variable set
to "mmx sse2":
Speedups
========
image16 firefox-asteroids (7197.13 0.06%) -> (6811.95 0.02%) : 1.06x speedup
image16 midori-zoomed (3397.77 0.67%) -> (3226.48 0.73%) : 1.05x speedup
image16 firefox-talos-svg (37677.55 0.05%) -> (36367.96 0.04%) : 1.04x speedup
Profiling logs for the run over all the benchmark traces show that the
overall improvement is rather small and the time spent in the r5g6b5
iterators changes from
0.88% 6094 libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5
0.77% 5294 libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5
to
0.59% 4018 libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5
0.52% 3550 libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5
Everything is just dominated by the bilinear scaling, which is likely
a bit overrepresented in the cairo benchmark traces. Complete profiling
logs for the sake of completeness:
=== before ===
27.16% 187079 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
14.55% 100256 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
11.90% 81931 libpixman-1.so.0.29.1 [.] combine_over_u
6.31% 43415 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5
3.76% 25838 libpixman-1.so.0.29.1 [.] radial_compute_color
3.14% 21605 libpixman-1.so.0.29.1 [.] fetch_scanline_a8
3.01% 20710 libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565
2.88% 19829 libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
2.53% 17370 libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
2.30% 15806 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
1.98% 13604 libpixman-1.so.0.29.1 [.] fast_path_fill
1.68% 11559 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888
1.52% 10496 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca
1.28% 8846 libpixman-1.so.0.29.1 [.] combine_in_reverse_u
0.91% 6270 libpixman-1.so.0.29.1 [.] bits_image_fetch_general
0.90% 6203 libc-2.15.so [.] __memcpy_ssse3_back
0.88% 6094 libpixman-1.so.0.29.1 [.] fetch_scanline_r5g6b5
0.77% 5294 libpixman-1.so.0.29.1 [.] store_scanline_r5g6b5
0.51% 3483 libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow
0.46% 3163 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565
0.42% 2924 libcairo.so.2.11200.0 [.] cell_list_render_edge
0.42% 2889 libpixman-1.so.0.29.1 [.] pixman_transform_point
=== after ===
27.31% 186552 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
14.66% 100225 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
11.69% 79916 libpixman-1.so.0.29.1 [.] combine_over_u
6.35% 43366 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_r5g6b5
3.78% 25829 libpixman-1.so.0.29.1 [.] radial_compute_color
3.20% 21848 libpixman-1.so.0.29.1 [.] fetch_scanline_a8
3.04% 20767 libpixman-1.so.0.29.1 [.] fast_composite_over_8888_0565
2.90% 19812 libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
2.56% 17489 libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
2.30% 15701 libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
1.98% 13499 libpixman-1.so.0.29.1 [.] fast_path_fill
1.69% 11537 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_8888
1.62% 11089 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8888_0565_ca
1.31% 8909 libpixman-1.so.0.29.1 [.] combine_in_reverse_u
0.92% 6303 libpixman-1.so.0.29.1 [.] bits_image_fetch_general
0.91% 6216 libc-2.15.so [.] __memcpy_ssse3_back
0.59% 4018 libpixman-1.so.0.29.1 [.] fast_fetch_r5g6b5
0.52% 3550 libpixman-1.so.0.29.1 [.] fast_write_back_r5g6b5
0.51% 3464 libpixman-1.so.0.29.1 [.] radial_get_scanline_narrow
0.48% 3266 libpixman-1.so.0.29.1 [.] fast_composite_over_n_8_0565
0.42% 2898 libpixman-1.so.0.29.1 [.] pixman_transform_point
The same patches are also available here:
http://cgit.freedesktop.org/~siamashka/pixman-g2d/log/?h=iterators-r5g6b5
Siarhei Siamashka (6):
test: add "src_0565_8888" to lowlevel-blt-bench
Change CONVERT_XXXX_TO_YYYY macros into inline functions
Faster conversion from a8r8g8b8 to r5g6b5 in C code
Added C variants of r5g6b5 fetch/write-back iterators
Faster write-back for the C variant of r5g6b5 dest iterator
Faster fetch for the C variant of r5g6b5 src/dest iterator
pixman/pixman-bits-image.c | 2 +-
pixman/pixman-fast-path.c | 258 +++++++++++++++++++++++++++++++++++---------
pixman/pixman-inlines.h | 30 +++---
pixman/pixman-mmx.c | 20 ++--
pixman/pixman-private.h | 53 +++++++--
pixman/pixman-sse2.c | 8 +-
pixman/pixman.c | 2 +-
test/lowlevel-blt-bench.c | 1 +
8 files changed, 279 insertions(+), 95 deletions(-)
--
1.7.8.6
More information about the Pixman
mailing list