[Pixman] [PATCH 5/5] Simple repeat: Extend too short source scanlines into temporary buffer

Wed Sep 21 09:34:46 PDT 2011

Hi,

> From: Taekyun Kim <tkq.kim at samsung.com>
>
> Too short scanlines can cause repeat handling overhead and optimized
> pixman composite functions usually process a bunch of pixels in a
> single loop iteration it might be beneficial to pre-extend source
> scanlines. The temporary buffers will usually reside in cache, so
> accessing them should be quite efficient.

Generally, this looks good, and it's certainly a big performance
improvement. I do have some comments though.

The main thing I dislike is that it allocates an image on the stack and
then only fills out some of the fields. The problem is if a fast path
some day starts making use of say the flags in the image, then this code
will start to siliently fail.

A solution might be to refactor the image allocation code such that we
can support stack allocated images internally. This could be done by
changing adding _pixman_image_init () and _pixman_bits_image_init()
functions, and then changing _pixman_image_allocate() and
_pixman_bits_image_create() to call those functions. We would need the
corresponding _fini() functions also.

Another possibility might be to call the combiner functions instead of
calling the fast path. Ie., calling

       _pixman_implementation_combine_32 (pixman_implementation_t *imp,
                                          pixman_op_t              op,
                                          uint32_t *               dest,
                                          const uint32_t *         src,
                                          const uint32_t *         mask,
                                          int                      width);

instead of the fast path. If we are going to composite scanline by
scanline anyway, then it doesn't seem like there should be any advantage
to the fast paths over the combiners. The important architectures have
SIMD variations of the combiners already.

The potential issue, as Siarhei has pointed out several times, is that
there is currently a bit of overhead from going through all the
implementation->implementation->implementation indirections. Maybe its
time to fix that. A simple fix would be to change the
_pixman_implementation_combine_* functions into
_pixman_implementation_lookup_combine_* functions that would return the
desired combiner instead of calling it. 

This would speed up the general path as well.

Also, a few formatting nit-picks:

>  static void
>  fast_composite_tiled_repeat (pixman_implementation_t *imp,
>  			     pixman_composite_info_t *info)
> @@ -1223,27 +1225,114 @@ fast_composite_tiled_repeat (pixman_implementation_t *imp,
>  	int32_t sx, sy;
>  	int32_t width_remain;
>  	int32_t num_pixels;
> +	int32_t src_width;
> +	int32_t i, j;
> +	pixman_image_t extended_src_image;
> +	uint32_t extended_src[REPEAT_MIN_WIDTH*2];

We generally put spaces around * when it is used as a binary operator.

> +	    if (need_src_extension)
> +	    {
> +		if (src_bpp == 32)
> +		{
> +		    PIXMAN_IMAGE_GET_LINE (src_image, 0, sy, uint32_t, src_stride, src_line, 1);
> +
> +		    for (i=0; i<src_width; )
> +		    {
> +			for (j=0; j<src_image->bits.width; j++, i++)

There are some missing spaces here as well (i=0, j=0, i<src_width etc.)

> +			{
> +			    extended_src[i] = src_line[j];
> +			}

Generally braces should only be used when the body is more than one
line, or if the other branch of an if statement has them. They should
not be used if the body is only one line.

Thanks,
Soren