[Pixman] [PIXMAN][PATCH 0/6] REPEAT_NORMAL support for bilinear scaled fast paths

Taekyun Kim podain77 at gmail.com
Mon Jun 13 05:18:47 PDT 2011


Hi, all.

I've modified previous patches based on siarhei's suggestions.
Patch set is consist of following 6 patches.

1. Replace boolean argument with flags
2. Introduce REPEAT_NORMAL for bilinear template
3. Bilinear REPEAT_NORMAL function declaration for sse2
4. Bilinear REPEAT_NORMAL function declaration for ARM
5. Enable Bilinear REPEAT_NORMAL fast path entries
6. Additional optimizaton of extending source scanline

Now, solid mask is correctly handled as it was before.

I measured performance for various length of REPEAT_NORMAL_MIN_WIDTH.
Performance was increased with the length become longer. I tuned the length 
to produce best performance and not to increase stack size too much.
Performance was not increased that much with length longer than 64.

Also, I tried to eliminate division inside of loop, something like...

while (vx < src_width_fixed)
{
    vx += unit_x;
    num_pixels++;
}

But this did not affect the performance even when source extension is not 
used. I couldn't find some other way of replacing those divisions. However 
by extending source scanline, divisions per scanline was reduced. Maybe 
we can try this later if it is really needed.

I also tried temporary scanline buffer reusing. But I couldn't get noticeable 
performance gain. So reusing patch is not included. I will do more experiment 
on this later.

Anyway here're the performance numbers

** sse2 (Core2 duo E5200) **
////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
///////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled         : 371.54Mpix/s
Nearest-scaled     : 350.73Mpix/s
Bilinear-scaled    : 122.51Mpix/s

<<<<< src = 1 x 512  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 105.81Mpix/s

<<<<< src = 15 x 15  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 110.35Mpix/s

<<<<< src = 63 x 63  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 109.11Mpix/s

** ARM NEON (S5PC110) **
////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
///////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled         : 151.89Mpix/s
Nearest-scaled     : 128.85Mpix/s
Bilinear-scaled    : 97.31Mpix/s

<<<<< src = 1 x 512  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 78.63Mpix/s

<<<<< src = 15 x 15  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 78.26Mpix/s

<<<<< src = 63 x 63  dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL  : 83.56Mpix/s

Taekyun Kim (6):
  Replace boolean arguments with flags for bilinear fast path template
  REPEAT_NORMAL support for bilinear fast path template
  sse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite function
  ARM: Add REPEAT_NORMAL functions to bilinear BIND macros
  Enable REPEAT_NORMAL bilinear fast path entries
  Bilinear REPEAT_NORMAL source line extension for too short src_width

 pixman/pixman-arm-common.h |   24 ++++-
 pixman/pixman-fast-path.h  |  234 +++++++++++++++++++++++++++++++++++++++-----
 pixman/pixman-sse2.c       |   11 ++-
 3 files changed, 236 insertions(+), 33 deletions(-)



More information about the Pixman mailing list