[Pixman] [PIXMAN][PATCH 0/6] REPEAT_NORMAL support for bilinear scaled fast paths
Taekyun Kim
podain77 at gmail.com
Mon Jun 13 05:18:47 PDT 2011
Hi, all.
I've modified previous patches based on siarhei's suggestions.
Patch set is consist of following 6 patches.
1. Replace boolean argument with flags
2. Introduce REPEAT_NORMAL for bilinear template
3. Bilinear REPEAT_NORMAL function declaration for sse2
4. Bilinear REPEAT_NORMAL function declaration for ARM
5. Enable Bilinear REPEAT_NORMAL fast path entries
6. Additional optimizaton of extending source scanline
Now, solid mask is correctly handled as it was before.
I measured performance for various length of REPEAT_NORMAL_MIN_WIDTH.
Performance was increased with the length become longer. I tuned the length
to produce best performance and not to increase stack size too much.
Performance was not increased that much with length longer than 64.
Also, I tried to eliminate division inside of loop, something like...
while (vx < src_width_fixed)
{
vx += unit_x;
num_pixels++;
}
But this did not affect the performance even when source extension is not
used. I couldn't find some other way of replacing those divisions. However
by extending source scanline, divisions per scanline was reduced. Maybe
we can try this later if it is really needed.
I also tried temporary scanline buffer reusing. But I couldn't get noticeable
performance gain. So reusing patch is not included. I will do more experiment
on this later.
Anyway here're the performance numbers
** sse2 (Core2 duo E5200) **
////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
///////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled : 371.54Mpix/s
Nearest-scaled : 350.73Mpix/s
Bilinear-scaled : 122.51Mpix/s
<<<<< src = 1 x 512 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 105.81Mpix/s
<<<<< src = 15 x 15 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 110.35Mpix/s
<<<<< src = 63 x 63 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 109.11Mpix/s
** ARM NEON (S5PC110) **
////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
///////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled : 151.89Mpix/s
Nearest-scaled : 128.85Mpix/s
Bilinear-scaled : 97.31Mpix/s
<<<<< src = 1 x 512 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 78.63Mpix/s
<<<<< src = 15 x 15 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 78.26Mpix/s
<<<<< src = 63 x 63 dst = 512 x 512 >>>>>
- Bilinear-Scaled (close to 1.x)-
REPEAT_NORMAL : 83.56Mpix/s
Taekyun Kim (6):
Replace boolean arguments with flags for bilinear fast path template
REPEAT_NORMAL support for bilinear fast path template
sse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite function
ARM: Add REPEAT_NORMAL functions to bilinear BIND macros
Enable REPEAT_NORMAL bilinear fast path entries
Bilinear REPEAT_NORMAL source line extension for too short src_width
pixman/pixman-arm-common.h | 24 ++++-
pixman/pixman-fast-path.h | 234 +++++++++++++++++++++++++++++++++++++++-----
pixman/pixman-sse2.c | 11 ++-
3 files changed, 236 insertions(+), 33 deletions(-)
More information about the Pixman
mailing list