[Pixman] [PATCH 1/5] REPEAT_NORMAL support for nearest bilinear fast path

Wed May 25 13:25:13 PDT 2011

On Tue, May 24, 2011 at 6:02 PM, Taekyun Kim <podain77 at gmail.com> wrote:
> Basically nearest/bilinear fast path templates take scanline_func as
> an argument. So we handle vertical repeat in macro templates.
> scanline_func can handle repeat inside of it or not. So we give following
> flags to macro templates which means whether the argument scanline_func
> support specific repeat mode or not.
>
>     FAST_PATH_SCANLINE_SUPPORT_REPEAT_NONE
>     FAST_PATH_SCANLINE_SUPPORT_REPEAT_PAD
>     FAST_PATH_SCANLINE_SUPPORT_REPEAT_NORMAL
>     FAST_PATH_SCANLINE_SUPPORT_REPEAT_REFLECT

Currently there are two potential types of scanline functions:
1. simple functions for NONE and PAD repeat
2. functions with automatic source image wrap around and having "while
(vx >= max_vx) vx -= max_vx;" code or its variation for NORMAL repeat

If there is a real need for REFLECT repeat, then it can be also
implemented, albeit somewhat more complex or less efficient. Actually
there is one more implementation of fast paths for nearest scaling
here:
    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-fast-path.c?id=pixman-0.22.0#n1498
with the fast paths generated from it registered here::
    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-fast-path.c?id=pixman-0.22.0#n1976
It is not very fast because the check for repeat type and its handling
is done per each pixel in the inner loop, but it's still useful for
REFLECT repeat.

Have you seen REFLECT repeat used anywhere in the wild?

> If all these flags are not turned on,  scanline_func can process only
> SAMPLES_COVER_CLIP cases. And as you suggested, mask related boolean values
> can also be replaced with flags like
>
>     FAST_PATH_HAVE_MASK
>     FAST_PATH_MASK_IS_SOLID
>
> And additionally following flags can be useful for REPEAT_REFLECT and
> horizontal flipping.
>
>     FAST_PATH_SCANLINE_SUPPORT_NEGATIVE_X_UNIT

I could be missing something, but I think that support for negative
unit_x is something that needs to be done in the main loop template
and should not really affect scanline functions.

> I've already done some REPEAT_NORMAL work for ARM standard fast paths and it
> shows great performance gain. But non-scaled PAD repeat(not
> SAMPLES_COVER_CLIP) still spend huge amount of time when browsing some
> popular web pages.

Yes, non-scaled PAD and NONE repeats are bad for performance. But
hitting them might be just an indication that the affected application
has some bugs in image coordinates calculations which would manifest
themselves as rendering glitches. For example if you have some image
and want to draw it without scaling, then accessing pixels outside the
bounds of this image may indicate the presence of off-by-one (or even
off-by-more-than-one) errors.

And unscaled NORMAL repeat is a real problem now:
    http://comments.gmane.org/gmane.comp.graphics.pixman/586
But if we get really fast nearest scaling covering many operations,
then probably they could be used for unscaled NONE/PAD/NORMAL repeat
too and somehow mitigate the issue.

>> + /* num_pixels = max(n), where  vx + (n - 1)*unit_x < width_fixed */
>>       \
>> + num_pixels = (new_src_width_fixed - vx + unit_x - pixman_fixed_e) /
>> unit_x;   \
>> + if (num_pixels > width_remain)
>>         \
>> +     num_pixels = width_remain;
>>         \
>>
>> I'm also a bit worried about this part. One thing that makes me feel a
>> bit uneasy is the following expression: "(new_src_width_fixed - vx +
>> unit_x - pixman_fixed_e)". Can we be sure that it can never overflow
>> pixman_fixed_t type? If yes, then some descriptive comment about the
>> valid range of values and assumptions would be useful here. There are
>> some checks in pixman code which reject certain range of parameters
>> for the compositing operation and impose some constraints:
>>
>>  http://cgit.freedesktop.org/pixman/tree/pixman/pixman.c?id=pixman-0.22.0#n556
>>
>>  http://cgit.freedesktop.org/pixman/tree/pixman/pixman.c?id=pixman-0.22.0#n572
>>
>>  http://cgit.freedesktop.org/pixman/tree/pixman/pixman.c?id=pixman-0.22.0#n641
>> We have 'scaling-crash-test.c' file in test directory which tries to
>> do some nasty scaling related things and trigger some problems:
>>
>>  http://cgit.freedesktop.org/pixman/tree/test/scaling-crash-test.c?id=pixman-0.22.0
>> But we might need to extend it with some more test cases specifically
>> targeting this your expression, or maybe even do some random fuzzing.
>>
>
> We have to find max(n), where vx + n*unit_x < max_vx.
> This is same problem finding max(n) where n*b < c.
> So n = (b - 1)/c (in integer arithmetic, where b and c is positive)
> This passed scaling-test but I figured out that
> num_pixels = ((max_vx - vx - pixman_fixed_e) / unit_x) + 1
> is correct expression. (new_src_width_fixed is replaced with max_vx)
> max_vx - vx - pixman_fixed_e does not cause any overflows, because vx is
> positive and less than max_vx. I've added some comment about this.

Ok, "((max_vx - vx - pixman_fixed_e) / unit_x) + 1" surely looks safer
than "(max_vx - vx + unit_x - pixman_fixed_e) / unit_x" to me and
indeed should produce identical results when vx < max_vx and unit_x >
0.

>> +           if (src_image->bits.width < PIXMAN_REPEAT_NORMAL_MIN_WIDTH) \
>> +           {                                                           \
>> +               for (i=0; i<PIXMAN_REPEAT_NORMAL_MIN_WIDTH; )           \
>> +               {                                                       \
>> +                   for (j=0; j<src_image->bits.width; j++, i++ )       \
>> +                   {                                                   \
>> +                       expended_top[i] = src_line_top[j];              \
>> +                       expended_bottom[i] = src_line_bottom[j];        \
>> +                   }                                                   \
>> +               }                                                       \
>> +               new_src_width = i;                                      \
>> +               src_line_top = &expended_top[0];                        \
>> +               src_line_bottom = &expended_bottom[0];                  \
>> +           }                                                           \
>>
>> Even though it's likely not a bit issue here, but at least in the case
>> of bilinear scaling these prepared extended source scanlines can be
>> reused across different iterations and not regenerated each times. In
>> the case if we have scale factor close to 1x, each source scanline is
>> going to be accessed approximately twice, for 2x upscaling - accessed
>> 4x times, etc.
>
> I've modified to reuse previous source scanline if it does not need update.

Good. I'm looking forward to seeing the updated patches. Thanks.

-- 
Best regards,
Siarhei Siamashka