[Pixman] [PATCH 5/7] armv7/mips/sse2: Fix bounds violations in bilinear cover scaled fast paths

Thu Aug 27 06:07:24 PDT 2015

On Thu, 27 Aug 2015 04:44:58 +0100, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> On Mon, 24 Aug 2015 21:42:04 +0100
> Ben Avison <bavison at riscosopen.org> wrote:
>
> The current COVER flag assumes that every pixel is fetched from the
> source image as a 2x2 block (regardless of whether the fractional part
> of the offset is zero or not) and this is explicitly expected to be
> safe.

The situation isn't quite that simple though. With the bilinear fast
paths defined using FAST_BILINEAR_MAINLOOP_INT(), the height of the block
is already set to 1 when the fractional part is 0 - albeit as a side
effect of not being able to express a fixed-point weight of 1.0 in a
16-bit value:

http://cgit.freedesktop.org/pixman/tree/pixman/pixman-inlines.h?id=pixman-0.33.2#n947

My ARMv6 bilinear fetchers (the ones still waiting to be committed)
already contained an optimisation to avoid processing a scanline whose
vertical weight was 0. Patch 6 of this series is a tweak to
fast_fetch_bilinear_cover() so it skips such scanlines, and
ssse3_fetch_bilinear_cover() would benefit from the same optimisation.

In the horizontal direction, it was actually Søren's post (accidentally
or otherwise) which first prompted me to see there's an issue: if you're
going to say that you always fetch two source pixels even when the
coordinate is exactly aligned to a single source pixel, then who's to say
whether the zero-weighted pixel is the one to the right of the true pixel
rather than the one to its left? In practice, FAST_BILINEAR_MAINLOOP_INT
chooses the one on the right, and this happens to match your suggested
change to the calculation of the FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR,
where we simply remove the 8 * pixman_fixed_e border. As a reminder of
what I wrote earlier, this test corresponds to:

transformed.x1 >= pixman_fixed_1 / 2
transformed.y1 >= pixman_fixed_1 / 2
transformed.x2 < image->bits.width * pixman_fixed_1 - pixman_fixed_1 / 2
transformed.y2 < image->bits.height * pixman_fixed_1 - pixman_fixed_1 / 2

If we assume that zero-weighted pixels to the left are always loaded,
these would need to be the tests instead:

transformed.x1 > pixman_fixed_1 / 2
transformed.y1 > pixman_fixed_1 / 2
transformed.x2 <= image->bits.width * pixman_fixed_1 - pixman_fixed_1 / 2
transformed.y2 <= image->bits.height * pixman_fixed_1 - pixman_fixed_1 / 2

Søren talked about samples with index -1 being fetched - in other words,
the zero-weighted pixel being the one to the left of the true one. This
actually turns out to be more efficient, because to avoid out-of-bounds
accesses when transformed.x1 == pixman_fixed_1 / 2, you only need to test
the first pixel on each row, which can be moved outside the loop. Any
later pixels on the same row which might fetch a zero-weighted pixel to
the left, you at least know that the zero-weighted pixel is within the
pixel row. If the zero-weighted pixel were to be to the right, you'd need
to test every single pixel, which would normally be a significant speed
hit.

It turns out to be surprisingly simple to cause the zero-weighted pixel
to be on the left - just negate the X increments, as demonstrated by
patch 6.

Things work even more smoothly with my ARMv6 bilinear fetchers (though I
haven't reposted them yet). I was able to take advantage of ARM
conditional execution to avoid reading zero-weighted pixels with zero
cycles overhead (and in fact some speed gain due to fewer cachelines
being fetched from RAM). The only necessary change to the core code:

    adds   accumulator, accumulator, increment_fractional
    ldrcs  in0, [source_ptr, #increment_integer + 1]!
    ldrcc  in0, [source_ptr, #increment_integer]!
    ldr    in1, [source_ptr, #1]

is to change the last ldr to an ldrne.

This is actually an exception to what I wrote above "If the zero-weighted
pixel were to be to the right, you'd need to test every single pixel,
which would normally be a significant speed hit." But since conditional
execution of arbitrary instructions is pretty unique to ARM, it make
sense to use the negated-increment method for the generic C
implementation.

> However with your new proposed definition of the COVER_CLIP_BILINEAR
> flag, the area is shrinked by "pixman_fixed_e" on the right side in

(enlarged by pixman_fixed_e, but I think that's what you meant)

> order to allow a special corner case to pass through. And this is where
> the bounds violations are likely coming from.
[...]
> Is there actually a real bug in the
> current pixman code? Because the commit summary looks kinda scary and
> may be misinterpreted by the general public.

Don't worry, the existing Pixman code doesn't have any bounds violations.
In fact, there aren't any bounds violations at all, despite what the
commit summaries say.

To explain this: quite late on in development, I added the
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR_APPROX flag, as an insurance policy
against me not finding a volunteer to adapt the SSSE3 bilinear fetcher.
But then I realised that if I added the flag early on, I could move
individual fast paths from the old flag to the new one a group at a time.
That way, the patch series would pass "make check", including cover-test,
at every stage. Without that flag change, it would have been the case
that there were bounds violations, albeit ones that only occurred if you
had only partially applied the patch series.

You're probably right that the commit summaries would look scary to
someone who didn't read the full commit logs. However, it's tough to
describe what they do in only one row of text and I couldn't think of
anything better...

Ben