[Pixman] [PATCH 4/7] More accurate FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR

Mon Aug 24 13:42:03 PDT 2015

Similar to the nearest neighbour case from the earlier patch, the
calculation of the cover flag for the bilinear case is also overly cautious
leading to the selection of sub-optimal fast paths and fetchers.

A worked example is as follows: consider a source image of size 2 x 1
being plotted at a 3 x 1 region at the top-left of the destination using
the transformation matrix

    / 0.5 0   0.25 \
T = | 0   1   0    |
    \ 0   0   1    /

In source and destination, x and y, Pixman considers the sample at array
index i to represent the colour at the centre of the pixel. So the top
three destination pixels correspond to coordinates (0.5, 0.5), (1.5, 0.5),
(2.5, 0.5). If we write these as

    / 0.5 1.5 2.5 \
D = | 0.5 0.5 0.5 |
    \ 1   1   1   /

then under the transformation, we get the equivalent source coordinates

        / 0.5 1   1.5 \
T . D = | 0.5 0.5 0.5 |
        \ 1   1   1   /

Thus output pixel 0 corresponds exactly to input pixel 0, output pixel 2
corresponds exactly to input pixel 1, and output pixel 1 is the average of
the two input pixels.

We have remained within the bounds of the source image for all pixels, but
the old flag definition incorrectly deduced that source index -1 for both
x and y, and source index 2 for x, were required and thus wouldn't choose a
"cover" fast path.

FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR is always used in combination with
FAST_PATH_SCALE_TRANSFORM, which is just a special case of an affine
transform, and so as with nearest affine transforms, there can be no
rounding losses when calculating the source coordinate corresponding to a
destination coordinate.

The only explanation that has been put forward for why the
8 * pixman_fixed_e was ever thought to be necessary was in order to permit
SIMD loads at the edges of images. It is possible to use the cover-test
program to catch this in action for some implementations; this is caused by
a common simplification:

Basically, as you iterate along a destination pixel row, it is possible to
calculate the 16.16 fixed-point coordinate of each source pixel by adding
the X increment to a running accumulator. If the bottom 16 bits of this
coordinate are non-zero, then you need a contribution from both the source
pixels nearest to this position (i.e. rounding both up and down after
dividing by 65536). Where the bottom 16 bits are zero, however, only the
rounded-down pixel is needed. However, code frequently loads both the
rounded-down pixel and the one to its immediate right anyway, on the
assumption that it will probably be needed - and relies on the definition
of the cover flag to avoid reading beyond array bounds when loading the
second pixel.

However, there are multiple methods by which this can be avoided without
the speed penalty of using a more complex fast path or fetcher. These will
be utilised on a case by case basis over the next few patches. Similar
fixes to my ARMv6 bilinear scaled fetchers (which have yet to be merged)
will be squashed into the original commits and provided as a repost at a
later date.
---
 pixman/pixman.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman.c b/pixman/pixman.c
index 22205a3..367ef33 100644
--- a/pixman/pixman.c
+++ b/pixman/pixman.c
@@ -507,6 +507,14 @@ analyze_extent (pixman_image_t       *image,
 	    *flags |= FAST_PATH_SAMPLES_COVER_CLIP_NEAREST;
 	}
 
+        if (pixman_fixed_to_int (transformed.x1 - pixman_fixed_1 / 2) >= 0                                 &&
+            pixman_fixed_to_int (transformed.y1 - pixman_fixed_1 / 2) >= 0                                 &&
+            pixman_fixed_to_int (transformed.x2 + pixman_fixed_1 / 2 - pixman_fixed_e) < image->bits.width &&
+            pixman_fixed_to_int (transformed.y2 + pixman_fixed_1 / 2 - pixman_fixed_e) < image->bits.height)
+        {
+            *flags |= FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR;
+        }
+
 	if (pixman_fixed_to_int (transformed.x1 - pixman_fixed_1 / 2 - 8 * pixman_fixed_e) >= 0                &&
 	    pixman_fixed_to_int (transformed.y1 - pixman_fixed_1 / 2 - 8 * pixman_fixed_e) >= 0                &&
 	    pixman_fixed_to_int (transformed.x2 + pixman_fixed_1 / 2 + 8 * pixman_fixed_e) < image->bits.width &&
-- 
1.7.5.4