[Pixman] [PATCH 2/3] MIPS: MIPS32r2: Basic infrastructure for MIPS32r2 optimizations
Siarhei Siamashka
siarhei.siamashka at gmail.com
Wed Jul 24 06:46:48 PDT 2013
On Tue, 23 Jul 2013 17:42:13 +0200
Nemanja Lukic <nemanja.lukic at rt-rk.com> wrote:
> Some of the optimizations introduced in previous DSPr2 commits were not DSPr2
> specific. Some of the fast-paths didn't used DSPr2 instructions at all, and
> rather utilized more generic MIPS32r2 instruction set for optimizations. Since
> Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS cores,
> these optimizations couldn't be used on cores that don't support DSPr2, but do
> support MIPS32r2 instructions (these are almost all newer MIPS CPU cores like
> 4K, 24K, 34K, 1004K, etc).
> This patch extracts those MIPS32r2 specific optimizations into new mips32r2 set
> of fast-paths, and adds infrastructure for future MIPS32r2-only optimizations
> with appropriate build and run time support.
> Following is the list of MIPS32r2 optimizations, introduced in previous DSPr2
> patches, tested on MIPS 24Kc core:
>
> Performance numbers before/after on MIPS-24kc @ 500 MHz
>
> Referent (before):
>
> src_x888_8888 = L1: 62.45 L2: 24.85 M: 27.23 ( 91.72%) HT: 23.46 VT: 23.95 R: 20.03 RT: 11.17 ( 104Kops/s)
> src_0565_0565 = L1: 137.77 L2: 53.22 M: 57.36 ( 96.57%) HT: 36.50 VT: 37.24 R: 31.48 RT: 12.82 ( 112Kops/s)
> src_8888_8888 = L1: 207.47 L2: 30.04 M: 30.90 (104.03%) HT: 23.40 VT: 23.10 R: 21.82 RT: 11.28 ( 103Kops/s)
> src_0888_0888 = L1: 95.01 L2: 35.94 M: 38.17 ( 96.39%) HT: 27.93 VT: 28.21 R: 23.97 RT: 11.30 ( 103Kops/s)
> src_n_8888 = L1: 94.84 L2: 56.74 M: 58.20 ( 97.96%) HT: 48.73 VT: 46.52 R: 42.56 RT: 20.83 ( 143Kops/s)
>
> Optimized (with these optimizations):
>
> src_x888_8888 = L1: 125.05 L2: 43.39 M: 43.87 (147.66%) HT: 25.80 VT: 26.47 R: 24.02 RT: 11.52 ( 105Kops/s)
> src_0565_0565 = L1: 206.13 L2: 92.95 M: 98.79 (166.30%) HT: 37.10 VT: 34.74 R: 31.23 RT: 13.35 ( 115Kops/s)
> src_8888_8888 = L1: 180.01 L2: 51.29 M: 52.53 (176.84%) HT: 24.49 VT: 22.59 R: 21.64 RT: 11.50 ( 105Kops/s)
> src_0888_0888 = L1: 142.24 L2: 65.84 M: 67.28 (169.87%) HT: 29.68 VT: 26.63 R: 25.40 RT: 11.80 ( 107Kops/s)
> src_n_8888 = L1: 249.99 L2: 159.84 M:182.03 (306.48%) HT: 66.76 VT: 60.66 R: 54.42 RT: 20.23 ( 139Kops/s)
> ---
> configure.ac | 41 +++
> pixman/Makefile.am | 20 ++-
> pixman/pixman-mips-common-asm.h | 282 +++++++++++++++++++
> pixman/pixman-mips-common.h | 452 ++++++++++++++++++++++++++++++
> pixman/pixman-mips-dspr2-asm.S | 361 ++++++++-----------------
> pixman/pixman-mips-dspr2-asm.h | 280 +-------------------
> pixman/pixman-mips-dspr2.c | 169 ++++++------
> pixman/pixman-mips-dspr2.h | 438 -----------------------------
> pixman/pixman-mips-memcpy-asm.S | 382 --------------------------
> pixman/pixman-mips.c | 47 +++-
> pixman/pixman-mips32r2-asm.S | 576 +++++++++++++++++++++++++++++++++++++++
> pixman/pixman-mips32r2-asm.h | 84 ++++++
> pixman/pixman-mips32r2.c | 190 +++++++++++++
> pixman/pixman-private.h | 5 +
> 14 files changed, 1888 insertions(+), 1439 deletions(-)
> create mode 100644 pixman/pixman-mips-common-asm.h
> create mode 100644 pixman/pixman-mips-common.h
> delete mode 100644 pixman/pixman-mips-dspr2.h
> delete mode 100644 pixman/pixman-mips-memcpy-asm.S
> create mode 100644 pixman/pixman-mips32r2-asm.S
> create mode 100644 pixman/pixman-mips32r2-asm.h
> create mode 100644 pixman/pixman-mips32r2.c
>
[...]
> index 866e93e..8ddde0f 100644
> --- a/pixman/pixman-mips-dspr2-asm.S
> +++ b/pixman/pixman-mips-dspr2-asm.S
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2012
> + * Copyright (c) 2013
Maybe 2012-2013?
> @@ -188,10 +184,11 @@ mips_dspr2_fill (pixman_implementation_t *imp,
> {
> uint8_t *dst = byte_line;
> byte_line += stride;
> - pixman_fill_buff16_mips (dst, byte_width, _xor & 0xffff);
> + pixman_fill_buff16_mips_dspr2 (dst, byte_width, _xor & 0xffff);
> }
> return TRUE;
> case 32:
> +#ifdef USE_MIPS32R2
> stride = stride * (int) sizeof (uint32_t) / 4;
> byte_line = (uint8_t *)(((uint32_t *)bits) + stride * y + x);
> byte_width = width * 4;
> @@ -201,8 +198,19 @@ mips_dspr2_fill (pixman_implementation_t *imp,
> {
> uint8_t *dst = byte_line;
> byte_line += stride;
> - pixman_fill_buff32_mips (dst, byte_width, _xor);
> + pixman_fill_buff32_mips32r2 (dst, byte_width, _xor);
> }
> +#else
> + bits = bits + y * stride + x;
> +
> + while (height--)
> + {
> + for (i = 0; i < width; ++i)
> + bits[i] = _xor;
> +
> + bits += stride;
> + }
> +#endif
Fallback to the C implementation further in the delegate chain instead
of this local loop would probably avoid unnecessary code duplication.
> return TRUE;
> default:
> return FALSE;
> @@ -250,7 +258,11 @@ mips_dspr2_blt (pixman_implementation_t *imp,
> uint8_t *dst = dst_bytes;
> src_bytes += src_stride;
> dst_bytes += dst_stride;
> - pixman_mips_fast_memcpy (dst, src, byte_width);
> +#ifdef USE_MIPS32R2
> + pixman_fast_memcpy_mips32r2 (dst, src, byte_width);
> +#else
> + memcpy (dst, src, byte_width);
> +#endif
Do you have this ifdef here (and also in mips_dspr2_fill) to support
64-bit DSPr2 systems? If yes, then probably some cleaner solution
could be used? Such as renaming "mips_dspr2_blt" to "mips32_dspr2_blt"
and not using it on 64-bit systems at all.
Also you are moving a lot of code between source files as-is and
including the introduction of new code into the same patch. Which
makes it a bit difficult to review everything at once. Could
you please split this patch into smaller parts?
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list