[Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags

Wed Sep 9 10:09:04 PDT 2015

On Wed, 09 Sep 2015 11:42:26 +0100, Pekka Paalanen <ppaalanen at gmail.com>
wrote:

> On Wed, 9 Sep 2015 09:39:07 +0300
> Oded Gabbay <oded.gabbay at gmail.com> wrote:
>
>> On Fri, Sep 4, 2015 at 5:09 AM, Ben Avison <bavison at riscosopen.org> wrote:
>> > Many bilinear fast paths currently assume that COVER_CLIP_BILINEAR is
>> > not set when the transformed upper coordinate corresponds exactly
>> > to the last row/pixel in source space. This is due to a detail of
>> > many current implementations - they assume they can always load the
>> > pixel after the one you get by dividing the coordinate by 2^16 with
>> > rounding to -infinity.
>> >
>> > To avoid causing these implementations to exceed array bounds in
>> > these cases, the COVER_CLIP_BILINEAR flag retains this feature in
>> > this patch. Subsequent patches deal with removing this assumption,
>> > to enable cover fast paths to be used in the maximum number of cases.
>
> I'm not sure if these two paragraphs about bilinear are really
> necessary here. The essence is that we remove the extra margins from
> both NEAREST (not 8*eps, but 7*eps and 9*eps, see below) and BILINEAR
> (8*eps), without changing them in any other way.
>
> The subsequent patches are still under discussion, and we have to see
> how they work out.

I felt that since the point of the patch is about getting the thresholds
correct to the exact multiple of pixman_fixed_e, and I'd been asked for a
"strict proof of correctness" in the commit message, then I felt this had
to be said - after all, I don't think this behaviour is documented
anywhere else.

Just witness the confusion about the issue - even Søren posted to say that
when the coordinate was exactly coincident with a source pixel that the
pixel with the next-lowest coordinate was loaded and multiplied by 0,
before correcting himself shortly afterwards to say it was the
next-highest one.

I hope I've demonstrated, via three completely different approaches, that
there's no reason why the upper bound has to be set the way it is - you
don't even have to sacrifice the ability to do SIMD loads. Even with the
existing implementations, most of those that do load pixels from the
next-highest coordinate when we end aligned on a source pixel in the
horizontal direction, don't do the same in the vertical direction.

I'm also playing a longer game - but because people get overfaced with
very long patch series, I'm holding back on some others that build on this
change. Eventually I re-use the same assembly functions in a context where
they need to adhere to my tighter limits, but I expect the pool of people
able to review the assembly code will be pretty small, so the ability to
demonstrate that they obey the limits when used for a COVER operation is
very useful.

Granted, perhaps the last sentence you quoted probably belonged after the
--- separator, especially if the whole series doesn't get pushed at once,
but I really want to see all of them to go through. I did originally post
the whole thing as a single patch, after all (mind you, I've spent a lot
of time working on bilinear scaling for ARMv6 and ARMv7, and this feels
like a really tiny and obvious part of it to me now...)

>> > diff --git a/test/affine-bench.c b/test/affine-bench.c
>> > index 9e0121e..697684b 100644
>> > --- a/test/affine-bench.c
>> > +++ b/test/affine-bench.c
>> > @@ -396,13 +396,17 @@ main (int argc, char *argv[])
>> >      }
>
> I think affine-bench is missing a comment saying that it specifically
> targets the COVER paths.

Maybe. Though in that case, you'd arguably want one in lowlevel-blt-bench
too, which does COVER paths for unscaled and nearest-scaled operations
(and until Siarhei's recent post, I thought that the fact that it didn't
for bilinear scaled operations was due to an oversight).

Since most non-COVER paths are actually fabricated from the same assembly
functions, then to get the "purest" benchmarks (free from the invariant
overheads in that fabrication process) you'd normally want to look at the
COVER case anyway.

>> > +    /* Note: when FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR is
>> > +     * retired, the upper limits can be reduced to:
>
> This comment is a bit confusing, since it does not explain what
> retiring COVER_CLIP_BILINEAR means. I think it would be better to
> rephrase as: "The upper limits can be reduced to the following if
> bilinear fetchers are guaranteed to not access the pixel beyond the row
> end with zero weight:" or something like that.

Hmm, I think I can blame the fact that I was asked to keep the name
"FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR" associated with the old definition
during the transitional period, and I just did a global
search-and-replace. It made more sense before that, because all the
references to the old definition would simply be deleted one by one as
each one was dealt with, rather than needing a big rename at the end.

The reason why the code is there but commented out is that it was in the
original version of my patch, and I didn't want to lose it in the
refactoring process, but equally I recognised that it couldn't be applied
until all uses of the old definition of the cover flag had been removed.

> All the above otherwise looks good to me, but there is one more place
> that has 8*eps, analyze_extent() in pixman.c has:
>
>     if (!compute_transformed_extents (transform, &exp_extents, &transformed))
> 	return FALSE;
>    if (!IS_16_16 (transformed.x1 + x_off - 8 * pixman_fixed_e)	||
> 	!IS_16_16 (transformed.y1 + y_off - 8 * pixman_fixed_e)	||
> 	!IS_16_16 (transformed.x2 + x_off + 8 * pixman_fixed_e + width)	||
> 	!IS_16_16 (transformed.y2 + y_off + 8 * pixman_fixed_e + height))
>     {
> 	return FALSE;
>     }
>
> Should we be removing these 8*eps too?

Given what Siarhei said, I think I understand where these came from now.
I'd agree that the correct thing to do would be to remove the 8*e parts,
though I think that could safely be in a separate patch.

Ben