[Pixman] [PATCH 0/7] SIMD optimizations for bilinear scaling

Tue Feb 22 13:23:41 PST 2011

From: Siarhei Siamashka <siarhei.siamashka at nokia.com>

This patch series introduces support for creating specialized
bilinear fast path functions which perform processing in a single
pass without intermediate temporary buffers and also can make
efficient use of SIMD optimizations. The performance critical
code is implemented as scanline processing functions with main
loop logic being reused via common macro template. Such scanline
processing functions are simple enough to implement and at the
same time large enough not to constrain optimization opportunities
and possibilities to do loop unrolling for processing multiple
pixels per iteration.

As a result, bilinear scaled 'src_8888_8888' operation (simple
scaled copy of the image) becomes more than 2 times faster
with SSE2 and more than 6 times faster with ARM NEON when
compared to the general pixman compositing path. And single
pass processing alone is providing some modest, but measurable
speedup even without SIMD.

I'm mostly exclusively interested in ARM NEON and I did not spend
any extra time on tuning this SSE2 code. So SSE2 scaler may be
actually not good enough. Nevertheless it is still faster than C.

The disadvantage of this method is the high specialization, so that
each particular type of compositing operation needs its own fast path
code. But it does not prevent us from also adding universal SIMD
optimized fetchers later. Anyway, adding specialized fast paths is
the way to go when targeting best performance for some of the most
common operations. I'll try to add more SIMD optimized bilinear fast
path functions shortly, based on analyzing cairo-traces and profiling
real use cases.

The same patches are also available in the following branch:
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=sent/bilinear-scaling-simd-20110222

Siarhei Siamashka (7):
  Main loop template for fast single pass bilinear scaling
  test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds'
  C variant of bilinear scaled 'src_8888_8888' fast path
  C variant of bilinear scaled 'src_8888_8_8888' fast path
  C variant of bilinear scaled 'src_8888_n_8888' fast path
  SSE2 optimization for bilinear scaled 'src_8888_8888'
  ARM: NEON optimization for bilinear scaled 'src_8888_8888'

 pixman/pixman-arm-neon-asm.S |  197 +++++++++++++++++++
 pixman/pixman-arm-neon.c     |   45 +++++
 pixman/pixman-fast-path.c    |  304 +++++++++++++++++++++++++++++
 pixman/pixman-fast-path.h    |  432 ++++++++++++++++++++++++++++++++++++++++++
 pixman/pixman-sse2.c         |  112 +++++++++++
 test/Makefile.am             |    2 +
 test/scaling-helpers-test.c  |   93 +++++++++
 7 files changed, 1185 insertions(+), 0 deletions(-)
 create mode 100644 test/scaling-helpers-test.c

-- 
1.7.3.4