[Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

Oded Gabbay oded.gabbay at gmail.com
Mon Aug 24 07:20:22 PDT 2015


On Thu, Aug 20, 2015 at 4:58 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> From: Ben Avison <bavison at riscosopen.org>
>
> This is a C fast path, useful for reference or for platforms that don't
> have their own fast path for this operation.
>
> This new fast path is initially disabled by putting the entries in the
> lookup table after the sentinel. The compiler cannot tell the new code
> is not used, so it cannot eliminate the code. Also the lookup table size
> will include the new fast path. When the follow-up patch then enables
> the new fast path, the binary layout (alignments, size, etc.) will stay
> the same compared to the disabled case.
>
> Keeping the binary layout identical is important for benchmarking on
> Raspberry Pi 1. The addresses at which functions are loaded will have a
> significant impact on benchmark results, causing unexpected performance
> changes. Keeping all function addresses the same across the patch
> enabling a new fast path improves the reliability of benchmarks.
>
> Benchmark results are included in the patch enabling this fast path.
>
> [Pekka: disabled the fast path, commit message]
> Signed-off-by: Pekka Paalanen <pekka.paalanen at collabora.co.uk>
> ---
>  pixman/pixman-fast-path.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>
> diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
> index 53d4a1f..6803037 100644
> --- a/pixman/pixman-fast-path.c
> +++ b/pixman/pixman-fast-path.c
> @@ -1122,6 +1122,37 @@ fast_composite_over_n_1_0565 (pixman_implementation_t *imp,
>      }
>  }
>
> +static void
> +fast_composite_over_n_8888 (pixman_implementation_t *imp,
> +                            pixman_composite_info_t *info)
> +{
> +    PIXMAN_COMPOSITE_ARGS (info);
> +    uint32_t  src;
> +    uint32_t *dst_line, *dst;
> +    int       dst_stride;
> +    int32_t   w;
> +
> +    src = _pixman_image_get_solid (imp, src_image, dest_image->bits.format);
> +
> +    if (src == 0)
> +        return;
> +
> +    PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1);
> +
> +    while (height--)
> +    {
> +        dst = dst_line;
> +        dst_line += dst_stride;
> +        w = width;
> +
> +        while (w--)
> +        {
> +            *dst = over (src, *dst);
> +            dst++;
> +        }
> +    }
> +}
> +
>  /*
>   * Simple bitblt
>   */
> @@ -1972,6 +2003,11 @@ static const pixman_fast_path_t c_fast_paths[] =
>      },
>
>      {   PIXMAN_OP_NONE },
> +
> +    PIXMAN_STD_FAST_PATH (OVER, solid, null, a8r8g8b8, fast_composite_over_n_8888),
> +    PIXMAN_STD_FAST_PATH (OVER, solid, null, x8r8g8b8, fast_composite_over_n_8888),
> +    PIXMAN_STD_FAST_PATH (OVER, solid, null, a8b8g8r8, fast_composite_over_n_8888),
> +    PIXMAN_STD_FAST_PATH (OVER, solid, null, x8b8g8r8, fast_composite_over_n_8888),
>  };
>
>  #ifdef WORDS_BIGENDIAN
> --
> 2.4.6
>
> _______________________________________________
> Pixman mailing list
> Pixman at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pixman

Hi,
I tested the patch on POWER8, ppc64le.
make check passes, but when I benchmarked the performance using
lowlevel-blt-bench over_n_8888, I got far worse results than without
this patch (see numbers below).
Apparently, the new C fast-path takes precedence over some vmx combine
fast-paths, thus making performance worse instead of better.
I think we should address that before merging this fast-path.

Without the patch:

reference memcpy speed = 25711.7MB/s (6427.9MP/s for 32bpp fills)
---
over_n_8888: PIXMAN_OP_OVER, src a8r8g8b8 solid, mask null, dst a8r8g8b8
---
             over_n_8888 =  L1: 572.29  L2:1038.08  M:1104.10 (
17.18%)  HT:447.45  VT:520.82  R:407.92  RT:148.90 (1100Kops/s)

With the patch:

reference memcpy speed = 23637.8MB/s (5909.5MP/s for 32bpp fills)
---
over_n_8888: PIXMAN_OP_OVER, src a8r8g8b8 solid, mask null, dst a8r8g8b8
---
             over_n_8888 =  L1: 174.75  L2: 182.93  M:182.01 (  3.08%)
 HT:162.27  VT:160.56  R:152.03  RT:114.61 (1005Kops/s)


         Oded


More information about the Pixman mailing list