[Pixman] [PATCH 04/32] pixman-general: Tighten up calculation of temporary buffer sizes

Tue Apr 21 07:25:37 PDT 2015

On Thu, 07 Aug 2014 17:50:00 +0100, I wrote:
> There's no need to align after the end of the temporary destination
> buffer, and each of the remaining aligns can only add a maximum of 15
> bytes to the space requirement. This permits some edge cases to use the
> stack buffer where previously it would have deduced that a heap buffer
> was required.
> ---
>  pixman/pixman-general.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
> index 7cdea29..c8f272b 100644
> --- a/pixman/pixman-general.c
> +++ b/pixman/pixman-general.c
> @@ -159,10 +159,10 @@ general_composite_rect  (pixman_implementation_t *imp,
>      mask_buffer = ALIGN (src_buffer + width * Bpp);
>      dest_buffer = ALIGN (mask_buffer + width * Bpp);
>-    if (ALIGN (dest_buffer + width * Bpp) >
> +    if (dest_buffer + width * Bpp >
>  	    scanline_buffer + sizeof (stack_scanline_buffer))
>      {
> -	scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 32 * 3);
> +	scanline_buffer = pixman_malloc_ab_plus_c (width, Bpp * 3, 15 * 3);
>	if (!scanline_buffer)
>  	    return;

I must say I never expected this patch to be controversial. I was merely
tinkering with some nearby code and saw this, which to me was an obvious
bug. I'm going to stand behind this patch for now, as explained below...

On Sun, 05 Oct 2014 20:03:42 +0100, Søren Sandmann <soren.sandmann at gmail.com> wrote:
> The idea behind aligning the final buffer is that it allows fetchers
> to assume that they can do a full 16-byte SIMD write to the buffer. We
> may possibly not do that, though.

I assume you mean combiner rather than fetcher here; the fetchers write
to the source and mask line buffers, and the combiner writes to the
destination line buffer. I've been thinking about this, and while perhaps
what you describe was the intention at one point, I reckon we can't
permit a combiner to write beyond its buffer limits because of the dest
iters in pixman-noop.c which force the combiner to write directly into
the destination bitmap when it's a8r8g8b8 or x8r8g8b8 format. This is
already a common use case, and will only become more so.

If the combiner is beholden to write exactly the right number of bytes
then there's no benefit to over-allocating the line buffer.

> The change to use 15 instead of 32 is alright I guess. Is this
> something that actually shows up in practice?

Here's a worked example. For any size of stack buffer that you choose,
there will be a few row lengths that do an unnecessary heap allocate and
free that can be avoided by using this patch.

Assume stack allocations are aligned to 8-byte boundaries.
Assume "Narrow" pixel format (4 bytes per pixel).
Let width = 2045 pixels.
The size of stack_scanline_buffer is determined by a compile-time define,
currently 8 kiB * 3 channels.

Suppose stack_scanline_buffer = [0x100008,0x106008), then:
src_buffer  = [0x100010,0x102004)
mask_buffer = [0x102010,0x104004)
dest_buffer = [0x104010,0x106004)

In this case it is clear that the three small buffers fit within
stack_scanline_buffer, but if the test is performed on the end address of
dest_buffer rounded up to a 16-byte boundary - which would be 0x106010 -
then you would incorrectly deduce that they don't.

Ben