[Pixman] FAST_PATH_SAMPLES_COVER_CLIP flag & fast_composite_scaled_nearest_*

Thu Jul 29 03:41:33 PDT 2010

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> Overall looks like a good fix, a few comments below.

Thanks for the comments. I'll send a new patch with a long commit log
as a follow-up to this message (provided I can make it work with
git-send-email), but I'll reply to some specifics below.

The main difference in the new patch is that the 16BIT_SAFE flag is
gone entirely, and instead pixman will simply bail out in that case.

> > +    if (!transform)
> > +    {
> > +        box->x1 = pixman_fixed_to_int (pixman_int_to_fixed (box->x1) + pixman_fixed_1 / 2 + x_off);
> > +        box->y1 = pixman_fixed_to_int (pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2 + y_off);
> > +        box->x2 = pixman_fixed_to_int (pixman_int_to_fixed (box->x2) - pixman_fixed_1 / 2 + x_off) + width + 1;
> > +        box->y2 = pixman_fixed_to_int (pixman_int_to_fixed (box->y2) - pixman_fixed_1 / 2 + y_off) + height + 1;
> > +        return TRUE;
>
> That's an interesting case. If I understand it correctly, without any transform
> at all, both NEAREST and BILINEAR filters should introduce no changes in the
> bounds. This is fine for NEAREST, but not for BILINEAR filter which gets the
> bounds expanded by 1.
>
> The bilinear filter is a bit special, because the sampling of extra rightmost
> pixels may technically happen according to formulas, but they get multiplied by
> zero anyway, so make no difference and are ignored by non-transformed fast
> paths.

The non-transformed fast paths certainly make this assumption, but the
general path will actually read these pixels, so we still have to
account for them. If we start seeing a lot of cases where images have
a bilinear filter, an identity transform, and we end up hitting the
general case, we could look into it, but I don't think this is a very
common case.

> > +    v[0].vector[0] = pixman_int_to_fixed (box->x1) + pixman_fixed_1 / 2;
> > +    v[0].vector[1] = pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2;
> > +    v[0].vector[2] = pixman_int_to_fixed (1);
> > +
> > +    v[1].vector[0] = pixman_int_to_fixed (box->x2) - pixman_fixed_1 / 2;
> > +    v[1].vector[1] = pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2;
> > +    v[1].vector[2] = pixman_int_to_fixed (1);
> > +
> > +    v[2].vector[0] = pixman_int_to_fixed (box->x2) - pixman_fixed_1 / 2;
> > +    v[2].vector[1] = pixman_int_to_fixed (box->y2) - pixman_fixed_1 / 2;
> > +    v[2].vector[2] = pixman_int_to_fixed (1);
> > +
> > +    v[3].vector[0] = pixman_int_to_fixed (box->x1) + pixman_fixed_1 / 2;
> > +    v[3].vector[1] = pixman_int_to_fixed (box->y2) - pixman_fixed_1 / 2;
> > +    v[3].vector[2] = pixman_int_to_fixed (1);
> > +
> > +    for (i = 0; i < 4; ++i)
> > +    {
> > +   if (!pixman_transform_point (transform, &v[i]))
> > +       return FALSE;
>
> Still what about the subtle differences between pixman_transform_point() and
> pixman_transform_point_3d()? They are not exactly the same. Transformed
> fetchers and fast path functions are all using pixman_transform_point_3d().

It's true that they are not exactly the same. In the new patch, I have
introduced a bit of slack in the computation of the source area (8 *
pixman_fixed_e). This is hopefully enough to account for any rounding
differences. The common special case where a NEAREST image is being
scaled so that the source area exactly matches the image will still
work because all we need in that case is for the computed bounds to be
within [0, 0.5] and 8 * pixman_fixed_1 is not even close to that.

It does mean of course that if you scale an image very slightly down,
the new code might decide to not set the FAST_PATH_SAMPLES_COVER_CLIP
flag, but I'm not that concerned about this.

I'd like to avoid relying on the two transformation functions being
exactly the same because I think it should be considered legitimate to
write compositing functions that transform in a different way if
that's more efficient, even if it means slightly different results.

> Another (minor) issue is that pixman_transform_point() has division operations,
> which may be not very good for performance.

We can't really avoid the divisions if the flags are to be set
correctly when the transformation is projective. I realize we don't
actually make use of the flags in that case, but it's still unpleasent
to rely on the knowledge that there aren't any projective fast paths,
in a place that should only have knowledge about a particular image.

So basically, I think the flags should always be computed correctly,
even if we know that an incorrect computation won't have any ill
effects.

> > +
> > +   x1 = pixman_fixed_to_int (v[i].vector[0] + x_off);
> > +   y1 = pixman_fixed_to_int (v[i].vector[1] + y_off);
> > +   x2 = x1 + width + 1;
> > +   y2 = y1 + height + 1;
>
> A minor performance improvement is possible. Addition of (width + 1) and
> (height + 1) to x2/y2 is done inside of the loop on each iteration here, 8
> times total. If done after the loop just before returning, it would be 2
> additions only.

The new patch is different is quite different, but it does incorporate
this optimization.

> >  static force_inline uint32_t
> > @@ -522,15 +624,12 @@ compute_src_extents_flags (pixman_image_t *image,
> >     extents16.x2 = extents->x2 - x;
> >     extents16.y2 = extents->y2 - y;
> >
> > -   if (!image->common.transform ||
> > -       pixman_transform_bounds (image->common.transform, &extents16))
> > +   if (compute_sample_extent (image, &extents16) &&
> > +       extents16.x1 >= 0  && extents16.y1 >= 0 &&
> > +       extents16.x2 <= image->bits.width &&
> > +       extents16.y2 <= image->bits.height)
> >     {
> > -       if (extents16.x1 >= 0  && extents16.y1 >= 0 &&
> > -           extents16.x2 <= image->bits.width &&
> > -           extents16.y2 <= image->bits.height)
> > -       {
> > -           flags |= FAST_PATH_SAMPLES_COVER_CLIP;
> > -       }
> > +       flags |= FAST_PATH_SAMPLES_COVER_CLIP;
> >     }
> >      }
>
> There is also one more call to pixman_transform_bounds() a few lines below
> which is used to check whether to set  FAST_PATH_16BIT_SAFE flag. Shouldn't it
> be also converted to use compute_sample_extent()?

In the new patch, the corresponding code does use
compute_sample_extent(), but the FAST_PATH_16BIT_SAFE is gone
entirely, and instead the new code simply bails out if the compositing
operation isn't '16 bit safe'

Thanks,
Soren