[Pixman] [PATCH 2/3] MIPS: MIPS32r2: Basic infrastructure for MIPS32r2 optimizations
Nemanja Lukic
nemanja.lukic at rt-rk.com
Wed Jul 24 07:43:51 PDT 2013
> -----Original Message-----
> From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com]
> Sent: Wednesday, July 24, 2013 3:47 PM
> To: Nemanja Lukic
> Cc: pixman at lists.freedesktop.org
> Subject: Re: [Pixman] [PATCH 2/3] MIPS: MIPS32r2: Basic infrastructure for
MIPS32r2 optimizations
>
> On Tue, 23 Jul 2013 17:42:13 +0200
> Nemanja Lukic <nemanja.lukic at rt-rk.com> wrote:
>
> > Some of the optimizations introduced in previous DSPr2 commits were not
DSPr2
> > specific. Some of the fast-paths didn't used DSPr2 instructions at all,
and
> > rather utilized more generic MIPS32r2 instruction set for optimizations.
Since
> > Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS
cores,
> > these optimizations couldn't be used on cores that don't support DSPr2,
but do
> > support MIPS32r2 instructions (these are almost all newer MIPS CPU cores
like
> > 4K, 24K, 34K, 1004K, etc).
> > This patch extracts those MIPS32r2 specific optimizations into new
mips32r2 set
> > of fast-paths, and adds infrastructure for future MIPS32r2-only
optimizations
> > with appropriate build and run time support.
> > Following is the list of MIPS32r2 optimizations, introduced in previous
DSPr2
> > patches, tested on MIPS 24Kc core:
> >
> > Performance numbers before/after on MIPS-24kc @ 500 MHz
> >
> > Referent (before):
> >
> > src_x888_8888 = L1: 62.45 L2: 24.85 M: 27.23 ( 91.72%) HT: 23.46
VT: 23.95 R: 20.03 RT: 11.17 ( 104Kops/s)
> > src_0565_0565 = L1: 137.77 L2: 53.22 M: 57.36 ( 96.57%) HT: 36.50
VT: 37.24 R: 31.48 RT: 12.82 ( 112Kops/s)
> > src_8888_8888 = L1: 207.47 L2: 30.04 M: 30.90 (104.03%) HT: 23.40
VT: 23.10 R: 21.82 RT: 11.28 ( 103Kops/s)
> > src_0888_0888 = L1: 95.01 L2: 35.94 M: 38.17 ( 96.39%) HT: 27.93
VT: 28.21 R: 23.97 RT: 11.30 ( 103Kops/s)
> > src_n_8888 = L1: 94.84 L2: 56.74 M: 58.20 ( 97.96%) HT: 48.73
VT: 46.52 R: 42.56 RT: 20.83 ( 143Kops/s)
> >
> > Optimized (with these optimizations):
> >
> > src_x888_8888 = L1: 125.05 L2: 43.39 M: 43.87 (147.66%) HT: 25.80
VT: 26.47 R: 24.02 RT: 11.52 ( 105Kops/s)
> > src_0565_0565 = L1: 206.13 L2: 92.95 M: 98.79 (166.30%) HT: 37.10
VT: 34.74 R: 31.23 RT: 13.35 ( 115Kops/s)
> > src_8888_8888 = L1: 180.01 L2: 51.29 M: 52.53 (176.84%) HT: 24.49
VT: 22.59 R: 21.64 RT: 11.50 ( 105Kops/s)
> > src_0888_0888 = L1: 142.24 L2: 65.84 M: 67.28 (169.87%) HT: 29.68
VT: 26.63 R: 25.40 RT: 11.80 ( 107Kops/s)
> > src_n_8888 = L1: 249.99 L2: 159.84 M:182.03 (306.48%) HT: 66.76
VT: 60.66 R: 54.42 RT: 20.23 ( 139Kops/s)
> > ---
> > configure.ac | 41 +++
> > pixman/Makefile.am | 20 ++-
> > pixman/pixman-mips-common-asm.h | 282 +++++++++++++++++++
> > pixman/pixman-mips-common.h | 452 ++++++++++++++++++++++++++++++
> > pixman/pixman-mips-dspr2-asm.S | 361 ++++++++-----------------
> > pixman/pixman-mips-dspr2-asm.h | 280 +-------------------
> > pixman/pixman-mips-dspr2.c | 169 ++++++------
> > pixman/pixman-mips-dspr2.h | 438 -----------------------------
> > pixman/pixman-mips-memcpy-asm.S | 382 --------------------------
> > pixman/pixman-mips.c | 47 +++-
> > pixman/pixman-mips32r2-asm.S | 576
+++++++++++++++++++++++++++++++++++++++
> > pixman/pixman-mips32r2-asm.h | 84 ++++++
> > pixman/pixman-mips32r2.c | 190 +++++++++++++
> > pixman/pixman-private.h | 5 +
> > 14 files changed, 1888 insertions(+), 1439 deletions(-)
> > create mode 100644 pixman/pixman-mips-common-asm.h
> > create mode 100644 pixman/pixman-mips-common.h
> > delete mode 100644 pixman/pixman-mips-dspr2.h
> > delete mode 100644 pixman/pixman-mips-memcpy-asm.S
> > create mode 100644 pixman/pixman-mips32r2-asm.S
> > create mode 100644 pixman/pixman-mips32r2-asm.h
> > create mode 100644 pixman/pixman-mips32r2.c
> >
>
> [...]
>
> > index 866e93e..8ddde0f 100644
> > --- a/pixman/pixman-mips-dspr2-asm.S
> > +++ b/pixman/pixman-mips-dspr2-asm.S
> > @@ -1,5 +1,5 @@
> > /*
> > - * Copyright (c) 2012
> > + * Copyright (c) 2013
>
> Maybe 2012-2013?
>
Yes. I'll change this in all files.
> > @@ -188,10 +184,11 @@ mips_dspr2_fill (pixman_implementation_t *imp,
> > {
> > uint8_t *dst = byte_line;
> > byte_line += stride;
> > - pixman_fill_buff16_mips (dst, byte_width, _xor & 0xffff);
> > + pixman_fill_buff16_mips_dspr2 (dst, byte_width, _xor &
0xffff);
> > }
> > return TRUE;
> > case 32:
> > +#ifdef USE_MIPS32R2
> > stride = stride * (int) sizeof (uint32_t) / 4;
> > byte_line = (uint8_t *)(((uint32_t *)bits) + stride * y + x);
> > byte_width = width * 4;
> > @@ -201,8 +198,19 @@ mips_dspr2_fill (pixman_implementation_t *imp,
> > {
> > uint8_t *dst = byte_line;
> > byte_line += stride;
> > - pixman_fill_buff32_mips (dst, byte_width, _xor);
> > + pixman_fill_buff32_mips32r2 (dst, byte_width, _xor);
> > }
> > +#else
> > + bits = bits + y * stride + x;
> > +
> > + while (height--)
> > + {
> > + for (i = 0; i < width; ++i)
> > + bits[i] = _xor;
> > +
> > + bits += stride;
> > + }
> > +#endif
>
> Fallback to the C implementation further in the delegate chain instead
> of this local loop would probably avoid unnecessary code duplication.
>
Good point. I'll fix this.
> > return TRUE;
> > default:
> > return FALSE;
> > @@ -250,7 +258,11 @@ mips_dspr2_blt (pixman_implementation_t *imp,
> > uint8_t *dst = dst_bytes;
> > src_bytes += src_stride;
> > dst_bytes += dst_stride;
> > - pixman_mips_fast_memcpy (dst, src, byte_width);
> > +#ifdef USE_MIPS32R2
> > + pixman_fast_memcpy_mips32r2 (dst, src, byte_width);
> > +#else
> > + memcpy (dst, src, byte_width);
> > +#endif
>
> Do you have this ifdef here (and also in mips_dspr2_fill) to support
> 64-bit DSPr2 systems? If yes, then probably some cleaner solution
> could be used? Such as renaming "mips_dspr2_blt" to "mips32_dspr2_blt"
> and not using it on 64-bit systems at all.
>
Actually no. This check is there, since DSPr1/2 and MIPS32r2 fast-paths
are separated. So one can disable mips32r2 at configure time, and leave dsp
optimizations. Than pixman_fast_memcpy_mips32r2 wouldn't be defined, so
code needs to fallback to something. Mips32r2 optimizations are separated
from dsp optimizations, since we will add more mips32r2-only optimizations
in future.
>
> Also you are moving a lot of code between source files as-is and
> including the introduction of new code into the same patch. Which
> makes it a bit difficult to review everything at once. Could
> you please split this patch into smaller parts?
>
Only new optimization that is added is pixman_fill_buff16 to
pixman-mips32r2-asm.S. All other are just code moving, from one file
to another. I'll refactor this, and create one patch that only does
moving of the code. Another patch will add this new optimization.
> --
> Best regards,
> Siarhei Siamashka
More information about the Pixman
mailing list