[Pixman] FAST_PATH_SAMPLES_COVER_CLIP flag & fast_composite_scaled_nearest_*

Mon Jul 26 10:24:14 PDT 2010

On Thursday 22 July 2010 09:57:11 Soeren Sandmann wrote:
> Soeren Sandmann <sandmann at daimi.au.dk> writes:
> > Instead, I think both issues can be fixed by using a new function:
> >         compute_sample_bounds (transform, box, filter)
> > 
> > that would do the rounded transformation on each of the four corner
> > pixels, adding the filter extents, and then compute the bounding box
> > of that instead. Does that seem like a reasonable plan?
> 
> Here is an attempt at doing this.
> 
> Soren
> 
> 
> From 1c04ce9105522ef68b28ea3a299135081d77a2b4 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= <ssp at redhat.com>
> Date: Thu, 22 Jul 2010 04:27:45 -0400
> Subject: [PATCH] Add new compute_sample_extents() and use it for
> FAST_PATH_SAMPLES_COVER_CLIP
> 
> Previously we were using pixman_transform_bounds() to compute which
> source samples would be used for a composite operation. This is
> incorrect though because (a) pixman_transform_bounds() is transforming
> the bounding box of the destination samples, which means it is too
> pessimistic, (b) it isn't rounding the same way as we do during
> sampling, which could make it too optimistic in some cases, and (c) it
> didn't take the filter of the operation into account.
> 
> The new function takes care of all three issues.

Overall looks like a good fix, a few comments below.

> ---
>  pixman/pixman.c |  115
> +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed,
> 107 insertions(+), 8 deletions(-)
> 
> diff --git a/pixman/pixman.c b/pixman/pixman.c
> index 80a766a..9d184ed 100644
> --- a/pixman/pixman.c
> +++ b/pixman/pixman.c
> @@ -488,6 +488,108 @@ walk_region_internal (pixman_implementation_t *imp,
>      }
>  }
> 
> +static pixman_bool_t
> +compute_sample_extent (pixman_image_t *image,
> +		       pixman_box16_t *box)
> +{
> +    pixman_transform_t *transform = image->common.transform;
> +    pixman_filter_t filter = image->common.filter;
> +    int x1, y1, x2, y2;
> +    pixman_fixed_t *params;
> +    pixman_fixed_t x_off, y_off;
> +    int width, height;
> +    pixman_vector_t v[4];
> +    int i;
> +
> +    switch (filter)
> +    {
> +    case PIXMAN_FILTER_CONVOLUTION:
> +	params = image->common.filter_params;
> +	x_off = - pixman_fixed_e - ((params[0] - pixman_fixed_1) >> 1);
> +	y_off = - pixman_fixed_e - ((params[1] - pixman_fixed_1) >> 1);
> +	width = pixman_fixed_to_int (params[0]);
> +	height = pixman_fixed_to_int (params[1]);
> +	break;
> +
> +    case PIXMAN_FILTER_GOOD:
> +    case PIXMAN_FILTER_BEST:
> +    case PIXMAN_FILTER_BILINEAR:
> +	x_off = - pixman_fixed_1 / 2;
> +	y_off = - pixman_fixed_1 / 2;
> +	width = 1;
> +	height = 1;
> +	break;
> +
> +    case PIXMAN_FILTER_FAST:
> +    case PIXMAN_FILTER_NEAREST:
> +	x_off = - pixman_fixed_e;
> +	y_off = - pixman_fixed_e;
> +	width = 0;
> +	height = 0;
> +	break;
> +
> +    default:
> +	return FALSE;
> +    }
> +
> +    if (!transform)
> +    {
> +	box->x1 = pixman_fixed_to_int (pixman_int_to_fixed (box->x1) +
>                pixman_fixed_1 / 2 + x_off);
> +	box->y1 = pixman_fixed_to_int
>                (pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2 + y_off);
> +	box->x2 = pixman_fixed_to_int (pixman_int_to_fixed (box->x2) -    
>                pixman_fixed_1 / 2 + x_off) + width + 1;
> +	box->y2 = pixman_fixed_to_int (pixman_int_to_fixed (box->y2) -
>                pixman_fixed_1 / 2 + y_off) + height + 1;
> +	return TRUE;

That's an interesting case. If I understand it correctly, without any transform 
at all, both NEAREST and BILINEAR filters should introduce no changes in the 
bounds. This is fine for NEAREST, but not for BILINEAR filter which gets the 
bounds expanded by 1.

The bilinear filter is a bit special, because the sampling of extra rightmost
pixels may technically happen according to formulas, but they get multiplied by
zero anyway, so make no difference and are ignored by non-transformed fast
paths.

> +    }
> +
> +    v[0].vector[0] = pixman_int_to_fixed (box->x1) + pixman_fixed_1 / 2;
> +    v[0].vector[1] = pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2;
> +    v[0].vector[2] = pixman_int_to_fixed (1);
> +
> +    v[1].vector[0] = pixman_int_to_fixed (box->x2) - pixman_fixed_1 / 2;
> +    v[1].vector[1] = pixman_int_to_fixed (box->y1) + pixman_fixed_1 / 2;
> +    v[1].vector[2] = pixman_int_to_fixed (1);
> +
> +    v[2].vector[0] = pixman_int_to_fixed (box->x2) - pixman_fixed_1 / 2;
> +    v[2].vector[1] = pixman_int_to_fixed (box->y2) - pixman_fixed_1 / 2;
> +    v[2].vector[2] = pixman_int_to_fixed (1);
> +
> +    v[3].vector[0] = pixman_int_to_fixed (box->x1) + pixman_fixed_1 / 2;
> +    v[3].vector[1] = pixman_int_to_fixed (box->y2) - pixman_fixed_1 / 2;
> +    v[3].vector[2] = pixman_int_to_fixed (1);
> +
> +    for (i = 0; i < 4; ++i)
> +    {
> +	if (!pixman_transform_point (transform, &v[i]))
> +	    return FALSE;

Still what about the subtle differences between pixman_transform_point() and 
pixman_transform_point_3d()? They are not exactly the same. Transformed 
fetchers and fast path functions are all using pixman_transform_point_3d(). 
Another (minor) issue is that pixman_transform_point() has division operations, 
which may be not very good for performance.

> +
> +	x1 = pixman_fixed_to_int (v[i].vector[0] + x_off);
> +	y1 = pixman_fixed_to_int (v[i].vector[1] + y_off);
> +	x2 = x1 + width + 1;
> +	y2 = y1 + height + 1;

A minor performance improvement is possible. Addition of (width + 1) and
(height + 1) to x2/y2 is done inside of the loop on each iteration here, 8
times total. If done after the loop just before returning, it would be 2
additions only.

> +	if (i == 0)
> +	{
> +	    box->x1 = x1;
> +	    box->y1 = y1;
> +	    box->x2 = x2;
> +	    box->y2 = y2;
> +	}
> +	else
> +	{
> +	    if (x1 < box->x1)
> +		box->x1 = x1;
> +	    if (y1 < box->y1)
> +		box->y1 = y1;
> +	    if (x2 > box->x2)
> +		box->x2 = x2;
> +	    if (y2 > box->y2)
> +		box->y2 = y2;
> +	}
> +    }
> +
> +    return TRUE;
> +}
> +
>  #define IS_16BIT(x) (((x) >= INT16_MIN) && ((x) <= INT16_MAX))
> 
>  static force_inline uint32_t
> @@ -522,15 +624,12 @@ compute_src_extents_flags (pixman_image_t *image,
>  	extents16.x2 = extents->x2 - x;
>  	extents16.y2 = extents->y2 - y;
> 
> -	if (!image->common.transform ||
> -	    pixman_transform_bounds (image->common.transform, &extents16))
> +	if (compute_sample_extent (image, &extents16) &&
> +	    extents16.x1 >= 0  && extents16.y1 >= 0 &&
> +	    extents16.x2 <= image->bits.width &&
> +	    extents16.y2 <= image->bits.height)
>  	{
> -	    if (extents16.x1 >= 0  && extents16.y1 >= 0 &&
> -		extents16.x2 <= image->bits.width &&
> -		extents16.y2 <= image->bits.height)
> -	    {
> -		flags |= FAST_PATH_SAMPLES_COVER_CLIP;
> -	    }
> +	    flags |= FAST_PATH_SAMPLES_COVER_CLIP;
>  	}
>      }

There is also one more call to pixman_transform_bounds() a few lines below 
which is used to check whether to set  FAST_PATH_16BIT_SAFE flag. Shouldn't it 
be also converted to use compute_sample_extent()?

-- 
Best regards,
Siarhei Siamashka