[Pixman] [PATCH 0/2] REPEAT_NORMAL support for scaled bilinear functions

Taekyun Kim podain77 at gmail.com
Tue May 31 06:52:47 PDT 2011


Hi,

I have modified previous REPEAT_NORMAL patches based on siarhei's
suggestions.
You can find previous patches from here
http://lists.freedesktop.org/archives/pixman/2011-May/001249.html

This patch set is focused on "Bilinear Scaling". Handling REPEAT_NORMAL
inside scanline functions can be more effective than stitching approach. And
it has already done for nearest scaling by siarhei siamashka.
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=nearest-normal-repeat


Major changes compared to previous patches are

1. Flags to configure the macro template rather than boolean values
2. Possible overflows are eliminated
3. Extended source scanlines are reused when they do not need to be updated


I have added following flags.
FLAG_NONE
FLAG_SCANLINE_SUPPORT_REPEAT_NORMAL
FLAG_HAVE_SOLID_MASK
FLAG_HAVE_NON_SOLID_MASK

FLAG_SCANLINE_SUPPORT_REPEAT_NORMAL means that template parameter
'scanline_func' can handle normal repeat inside of it. So if this flag is
on, the template bypasses repeat handling. Later we can configure macro
template not to use stitching by turning this flag on.

But I'm a bit confused that scanline function may expect positive vx and
unit_x > 0. Handling repeat normal using while( vx >= max_vx ) vx -= max_vx
requires that assumption. This also means that such scanline functions
cannot handle negative unit_x. And vx should be handled at least once
outside the scanline functions (to get positive vx).

Current FAST_NEAREST_SCANLINE(Used by C fast paths) can handle only positive
unit_x and currently we achieve this assumption by setting
FAST_PATH_X_UNIT_POSITIVE flag. FAST_NEAREST/BILINEAR_MAINLOOP_INT does not
know whether the scanline can handle negative unit_x, so the correctness is
achieved by setting proper fast path flags. I'm a bit worried about this.
Previously I proposed FLAG_SCANLINE_SUPPORT_NEGATIVE_UNIT_X to explicitly
control this and I think it is more readable. Can give me some comments
about this?

Mask related flags are slightly changed to (NONE, SOLID_MASK,
NON_SOLID_MASK). I prefer this kind of convention because it strictly
identifies the configuration by a single flag.

Below is some performance benchmark result of sse2 and NEON fast paths.
It seems that effect of scanline reusing is not that good than I expected
(maybe there're some mistakes in my patch). I think this is because
previously used scanline is supposed to be in L1 cache or
REPEAT_NORMAL_MIN_WIDTH is too small. When applying this approach to
non-scaled functions, performance was even better than reference compositing
because memory access was the major bottleneck.

----- sse2 benchmark -----
//////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
/////////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled : 372.30Mpix/s Nearest-scaled : 348.53Mpix/s Bilinear-scaled :
122.59Mpix/s <<<<< src = 1 x 512 dst = 512 x 512 >>>>> - Bilinear-Scaled
(close to 1.x)- Before : REPEAT_NORMAL : 28.16Mpix/s
After : REPEAT_NORMAL : 38.46Mpix/s (without source line extension)
After : REPEAT_NORMAL : 74.70Mpix/s (use source line extension but not
reusing)
After : REPEAT_NORMAL : 75.88Mpix/s
<<<<< src = 15 x 15 dst = 512 x 512 >>>>> - Bilinear-Scaled (close to 1.x)-
Before : REPEAT_NORMAL : 24.56Mpix/s
After : REPEAT_NORMAL : 92.69Mpix/s (without source line extension)
After : REPEAT_NORMAL : 89.70Mpix/s (use source line extension but not
reusing)
After : REPEAT_NORMAL : 89.33Mpix/s
<<<<< src = 63 x 63 dst = 512 x 512 >>>>> - Bilinear-Scaled (close to 1.x)-
Before : REPEAT_NORMAL : 24.78Mpix/s
After : REPEAT_NORMAL : 114.72Mpix/s (without source line extension)
After : REPEAT_NORMAL : 114.53Mpix/s (use source line extension but not
reusing)
After : REPEAT_NORMAL : 114.25Mpix/s

----- ARM NEON benchmark -----
//////////////////////////////////////////////////////////////////////////////
// op=SRC, src=a8r8g8b8, mask=None, dst=a8r8g8b8
/////////////////////////////////////////////////////////////////////////////
<<<<< Reference Compositing Performance 2000x2000 to 2000x2000 >>>>>
Non-scaled : 158.33Mpix/s Nearest-scaled : 144.23Mpix/s Bilinear-scaled :
99.89Mpix/s <<<<< src = 1 x 512 dst = 512 x 512 >>>>> - Bilinear-Scaled
(close to 1.x)- Before : REPEAT_NORMAL : 5.64Mpix/s
After : REPEAT_NORMAL : 11.25Mpix/s (without source line extension)
After : REPEAT_NORMAL : 37.08Mpix/s (use source line extension but not
reusing)
After : REPEAT_NORMAL : 36.73Mpix/s <<<<< src = 15 x 15 dst = 512 x 512
>>>>> - Bilinear-Scaled (close to 1.x)- Before : REPEAT_NORMAL : 3.78Mpix/s
After : REPEAT_NORMAL : 50.38Mpix/s (without source line extension)
After : REPEAT_NORMAL : 51.36Mpix/s (use source line extension but not
reusing) After : REPEAT_NORMAL : 50.96Mpix/s
<<<<< src = 63 x 63 dst = 512 x 512 >>>>> - Bilinear-Scaled (close to 1.x)-
Before : REPEAT_NORMAL : 4.13Mpix/s
After : REPEAT_NORMAL : 82.80Mpix/s (without source line extension)
After : REPEAT_NORMAL : 83.65Mpix/s (use source line extension but not
reusing)
After : REPEAT_NORMAL : 83.61Mpix/s

-- 
Best Regards,
Taekyun Kim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20110531/6ae6f98e/attachment.htm>


More information about the Pixman mailing list