[Pixman] [PATCH 2/3] MIPS: MIPS32r2: Basic infrastructure for MIPS32r2 optimizations

Siarhei Siamashka siarhei.siamashka at gmail.com
Wed Jul 24 06:46:48 PDT 2013


On Tue, 23 Jul 2013 17:42:13 +0200
Nemanja Lukic <nemanja.lukic at rt-rk.com> wrote:

> Some of the optimizations introduced in previous DSPr2 commits were not DSPr2
> specific. Some of the fast-paths didn't used DSPr2 instructions at all, and
> rather utilized more generic MIPS32r2 instruction set for optimizations. Since
> Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS cores,
> these optimizations couldn't be used on cores that don't support DSPr2, but do
> support MIPS32r2 instructions (these are almost all newer MIPS CPU cores like
> 4K, 24K, 34K, 1004K, etc).
> This patch extracts those MIPS32r2 specific optimizations into new mips32r2 set
> of fast-paths, and adds infrastructure for future MIPS32r2-only optimizations
> with appropriate build and run time support.
> Following is the list of MIPS32r2 optimizations, introduced in previous DSPr2
> patches, tested on MIPS 24Kc core:
> 
> Performance numbers before/after on MIPS-24kc @ 500 MHz
> 
> Referent (before):
> 
> src_x888_8888 =  L1:  62.45  L2:  24.85  M: 27.23 ( 91.72%)  HT: 23.46  VT: 23.95  R: 20.03  RT: 11.17 ( 104Kops/s)
> src_0565_0565 =  L1: 137.77  L2:  53.22  M: 57.36 ( 96.57%)  HT: 36.50  VT: 37.24  R: 31.48  RT: 12.82 ( 112Kops/s)
> src_8888_8888 =  L1: 207.47  L2:  30.04  M: 30.90 (104.03%)  HT: 23.40  VT: 23.10  R: 21.82  RT: 11.28 ( 103Kops/s)
> src_0888_0888 =  L1:  95.01  L2:  35.94  M: 38.17 ( 96.39%)  HT: 27.93  VT: 28.21  R: 23.97  RT: 11.30 ( 103Kops/s)
> src_n_8888    =  L1:  94.84  L2:  56.74  M: 58.20 ( 97.96%)  HT: 48.73  VT: 46.52  R: 42.56  RT: 20.83 ( 143Kops/s)
> 
> Optimized (with these optimizations):
> 
> src_x888_8888 =  L1: 125.05  L2:  43.39  M: 43.87 (147.66%)  HT: 25.80  VT: 26.47  R: 24.02  RT: 11.52 ( 105Kops/s)
> src_0565_0565 =  L1: 206.13  L2:  92.95  M: 98.79 (166.30%)  HT: 37.10  VT: 34.74  R: 31.23  RT: 13.35 ( 115Kops/s)
> src_8888_8888 =  L1: 180.01  L2:  51.29  M: 52.53 (176.84%)  HT: 24.49  VT: 22.59  R: 21.64  RT: 11.50 ( 105Kops/s)
> src_0888_0888 =  L1: 142.24  L2:  65.84  M: 67.28 (169.87%)  HT: 29.68  VT: 26.63  R: 25.40  RT: 11.80 ( 107Kops/s)
> src_n_8888    =  L1: 249.99  L2: 159.84  M:182.03 (306.48%)  HT: 66.76  VT: 60.66  R: 54.42  RT: 20.23 ( 139Kops/s)
> ---
>  configure.ac                    |   41 +++
>  pixman/Makefile.am              |   20 ++-
>  pixman/pixman-mips-common-asm.h |  282 +++++++++++++++++++
>  pixman/pixman-mips-common.h     |  452 ++++++++++++++++++++++++++++++
>  pixman/pixman-mips-dspr2-asm.S  |  361 ++++++++-----------------
>  pixman/pixman-mips-dspr2-asm.h  |  280 +-------------------
>  pixman/pixman-mips-dspr2.c      |  169 ++++++------
>  pixman/pixman-mips-dspr2.h      |  438 -----------------------------
>  pixman/pixman-mips-memcpy-asm.S |  382 --------------------------
>  pixman/pixman-mips.c            |   47 +++-
>  pixman/pixman-mips32r2-asm.S    |  576 +++++++++++++++++++++++++++++++++++++++
>  pixman/pixman-mips32r2-asm.h    |   84 ++++++
>  pixman/pixman-mips32r2.c        |  190 +++++++++++++
>  pixman/pixman-private.h         |    5 +
>  14 files changed, 1888 insertions(+), 1439 deletions(-)
>  create mode 100644 pixman/pixman-mips-common-asm.h
>  create mode 100644 pixman/pixman-mips-common.h
>  delete mode 100644 pixman/pixman-mips-dspr2.h
>  delete mode 100644 pixman/pixman-mips-memcpy-asm.S
>  create mode 100644 pixman/pixman-mips32r2-asm.S
>  create mode 100644 pixman/pixman-mips32r2-asm.h
>  create mode 100644 pixman/pixman-mips32r2.c
> 

[...]

> index 866e93e..8ddde0f 100644
> --- a/pixman/pixman-mips-dspr2-asm.S
> +++ b/pixman/pixman-mips-dspr2-asm.S
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2012
> + * Copyright (c) 2013

Maybe 2012-2013?

> @@ -188,10 +184,11 @@ mips_dspr2_fill (pixman_implementation_t *imp,
>          {
>              uint8_t *dst = byte_line;
>              byte_line += stride;
> -            pixman_fill_buff16_mips (dst, byte_width, _xor & 0xffff);
> +            pixman_fill_buff16_mips_dspr2 (dst, byte_width, _xor & 0xffff);
>          }
>          return TRUE;
>      case 32:
> +#ifdef USE_MIPS32R2
>          stride = stride * (int) sizeof (uint32_t) / 4;
>          byte_line = (uint8_t *)(((uint32_t *)bits) + stride * y + x);
>          byte_width = width * 4;
> @@ -201,8 +198,19 @@ mips_dspr2_fill (pixman_implementation_t *imp,
>          {
>              uint8_t *dst = byte_line;
>              byte_line += stride;
> -            pixman_fill_buff32_mips (dst, byte_width, _xor);
> +            pixman_fill_buff32_mips32r2 (dst, byte_width, _xor);
>          }
> +#else
> +        bits = bits + y * stride + x;
> +
> +        while (height--)
> +        {
> +            for (i = 0; i < width; ++i)
> +                bits[i] = _xor;
> +
> +            bits += stride;
> +        }
> +#endif

Fallback to the C implementation further in the delegate chain instead
of this local loop would probably avoid unnecessary code duplication.

>          return TRUE;
>      default:
>          return FALSE;
> @@ -250,7 +258,11 @@ mips_dspr2_blt (pixman_implementation_t *imp,
>              uint8_t *dst = dst_bytes;
>              src_bytes += src_stride;
>              dst_bytes += dst_stride;
> -            pixman_mips_fast_memcpy (dst, src, byte_width);
> +#ifdef USE_MIPS32R2
> +            pixman_fast_memcpy_mips32r2 (dst, src, byte_width);
> +#else
> +            memcpy (dst, src, byte_width);
> +#endif

Do you have this ifdef here (and also in mips_dspr2_fill) to support
64-bit DSPr2 systems? If yes, then probably some cleaner solution
could be used? Such as renaming "mips_dspr2_blt" to "mips32_dspr2_blt"
and not using it on 64-bit systems at all.


Also you are moving a lot of code between source files as-is and
including the introduction of new code into the same patch. Which
makes it a bit difficult to review everything at once. Could
you please split this patch into smaller parts?

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list