[Pixman] [PATCH] Add scaled nearest repeat fast paths

Tue Jul 3 01:10:58 PDT 2012

On Sun, Jul 1, 2012 at 10:51 PM, Søren Sandmann <sandmann at cs.au.dk> wrote:
> Jeff Muizelaar <jmuizelaar at mozilla.com> writes:
>
>> Siarhei wrote this patch and we've been using it in the Mozilla tree since May.
>>
>> Before this patch it was often faster to scale and repeat in two
>> passes because each pass used a fast path vs. the slow path that the
>> single pass approach takes. This makes it so that the single pass
>> approach has competitive performance.
>>
>>
>> commit 8ab69ca521a4d0c47b5fcb2c200d21d661acd195
>> Author: Siarhei Siamashka <siarhei.siamashka at gmail.com>
>> Date:   Mon Jun 25 22:36:52 2012 -0400
>>
>>     Add scaled nearest repeat fast paths
>>
>>     Before this patch it was often faster to scale and repeat
>>     in two passes because each pass used a fast path vs.
>>     the slow path that the single pass approach takes. This
>>     makes it so that the single pass approach has competitive
>>     performance.
>
> I think we need a bit of explanation what the difference between max_vx
> and src_width_fixed is.

This code tweaks NORMAL repeat implementation to run 'vx' variable
from negative up to 0 instead of keeping it always positive and run up
to 'max_vx'. The advantage is that comparison with zero has less
overhead and is faster. So we have
-		while (vx >= max_vx)
-		    vx -= max_vx;
+		while (vx >= 0)
+		    vx -= src_width_fixed;
Which translates to something like this in assembly:
-    add vx, unit_x
-    cmp vx, max_vx
-    js  <branch_over_vx_adjustment>
+    add vx, unit_x
+    js  <branch_over_vx_adjustment>

So after this change, 'max_vx' variable does not exist anymore
('max_vx' is 0). And 'src_width_fixed' is the source image width in
fixed point format.

> Some performance measurements would be useful too.

The primary purpose is to add NORMAL repeat support to NEAREST scaled
ARM NEON fast paths at no cost. It just adds two extra ARM
instructions per pixel, which has negligible impact on performance
because NEON unit dominates the timing of the sequence anyway:
    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch16s05s01.html

>> --- a/pixman/pixman-fast-path.c
>> +++ b/pixman/pixman-fast-path.c
>> @@ -1414,13 +1414,13 @@ scaled_nearest_scanline_565_565_SRC (uint16_t *       dst,
>>      uint16_t tmp1, tmp2, tmp3, tmp4;
>>      while ((w -= 4) >= 0)
>>      {
>> -     tmp1 = src[pixman_fixed_to_int (vx)];
>> +     tmp1 = *(src + pixman_fixed_to_int (vx));
>
> What is the purpose of these changes? There are several instances of it.

Just cosmetics. Because 'vx' value becomes negative, using it as an
array index may confuse people reading the code and also possibly be
flagged by static analysis tools or runtime bounds checkers.
    http://stackoverflow.com/questions/3473675/negative-array-indexes-in-c

-- 
Best regards,
Siarhei Siamashka