[Pixman] [PATCH 0/3] Pixman MIPS DSPASE1

Thu Feb 24 09:17:38 PST 2011

Hi,

Thanks for picking up the MIPS work. There are some comments from last
time from Siarhei and myself that I don't think have been addressed. See
these mails:

http://lists.freedesktop.org/archives/pixman/2010-December/000773.html
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html

- In Siarhei's testing, the new over_n_8_8888() on MIPS32r2 was slower
  than the C fast path. From
  http://lists.freedesktop.org/archives/pixman/2010-December/000773.html :

  "One of the reasons for such a slowdown in gnome-system-monitor test is
   that it uses 'over_n_8_8888' operation with the mask where 96.5% of
   values are zero.  And your MIPS32R2 optimized code does not handle
   these special cases, always taking the slowest path [1]."

  Ie., the way to make over_n_8_8888() fast is to skip compositing
  whenever the mask is 0x00 or 0xff.

  The same is likely also worthwhile even in the SIMD versions since
  memory access is so expensive.

From
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html :

- The patch should be split such that one commit adds the MIPS32r2 part
  and one adds the DSPASE part

- Coding style:
  - Please use /* */ comments
  - Indents are four spaces
  - Put a space before parentheses
  - Don't leave in commented-out code like this:
     //              b = _pixman_implementation_fill(imp->delegate,
     bits, stride, bpp, x, y, width, height, xor);

And finally, while the lowlevel-blt benchmarks are convenient to use,
they are also synthetic, it is also important to test the performance
with real-world workloads such as those found in the cairo perf traces.

Thanks,
Soren

Veli-Matti Valtonen <veli-matti.valtonen at movial.com> writes:

> I started working on this optimizing for MIPS32R2 code originally (Based on the patch by Beloev), but the performance increases seem to be relatively similar to what over_n_8_8888 shows. The dspase is much more promising in this regard. It rather leaves me wondering if the mips32r2 should not be included.
>
> It might however be related to the test system, which has a MIPS 74K core. The original I assume was worked on with a MIPS 24K.
>
> I used pixman-arm-common.h for the assembler binding macros, which is the reason for the 'ARM' found in the glue.
>
> Compiling the code will result in the gcc producing Warnings about macro expansion, it'd be nice not to have these, but "fixing" them would have a (slight) negative effect readability.
>
> PATCH 1 is the original patch by Georgi Beloev, but modified to apply against pixman head.
>
> Implemented:
> Scanline add, out reverse, over
> fast path:
> over_n_8_8888
> add_8888_8888
> add_n_8888
>
> Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, MIPS16), Little Endian
>
> All the test program builds used CFLAGS="-O2 -mdsp -mips32r2"
>
> reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills)
>
> Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1
> over_n_8_8888 =  L1:   6.16  L2:   5.34  M:  5.35 ( 19.24%)  HT:  4.78  VT:  4.62  R:  4.55  RT:  2.99 (  28Kops/s)
> add_8888_8888 =  L1:  18.11  L2:  10.15  M:  9.98 ( 45.33%)  HT: 14.80  VT: 13.36  R: 13.41  RT:  6.17 (  46Kops/s)
> add_n_8888 =  L1:  14.26  L2:  10.30  M: 10.38 ( 23.59%)  HT:  8.05  VT:  7.64  R:  7.63  RT:  4.05 (  33Kops/s)
>
> MIPS32R2: --disable-mips-dspase1
> over_n_8_8888 =  L1:   6.17  L2:   5.62  M:  5.56 ( 20.33%)  HT:  5.00  VT:  4.83  R:  4.76  RT:  3.33 (  30Kops/s)
>
> MIPS DSPASE:
> over_n_8_8888 =  L1:   9.76  L2:   7.89  M:  7.93 ( 27.11%)  HT:  7.04  VT:  6.84  R:  6.63  RT:  4.06 (  34Kops/s)
> add_8888_8888 =  L1: 117.36  L2:  20.67  M: 23.22 (105.50%)  HT: 17.40  VT: 15.96  R: 13.81  RT:  6.48 (  47Kops/s)
> add_n_8888 =  L1: 145.84  L2:  28.23  M: 31.11 ( 70.66%)  HT: 22.95  VT: 18.54  R: 19.99  RT:  8.93 (  50Kops/s)
>
> Scanline ops benchmarked using low-level-blit:
>
> I selected these ops by adding a printf to the scanline ops, and finding one that triggers it, if there is a more convenient way to benchmark these ops, I failed to find it.
>
> Optimizations disabled:
> add_8_8_8 =  L1:   3.31  L2:   5.25  M:  5.16 ( 11.73%)  HT:  3.61  VT:  3.60  R:  3.53  RT:  1.77 (  18Kops/s)
> add_8888_1555 =  L1:   6.51  L2:   5.32  M:  5.34 ( 18.20%)  HT:  4.05  VT:  3.96  R:  3.94  RT:  2.21 (  22Kops/s)
> outrev_n_8_8888 =  L1:   6.33  L2:   5.25  M:  5.16 ( 17.60%)  HT:  4.11  VT:  4.02  R:  3.97  RT:  2.23 (  22Kops/s)
> over_8888_n_0565 =  L1:   2.83  L2:   3.33  M:  3.21 ( 11.54%)  HT:  2.73  VT:  2.69  R:  2.68  RT:  1.67 (  17Kops/s)
> over_n_8888 =  L1:   7.45  L2:   6.65  M:  6.66 ( 15.14%)  HT:  5.65  VT:  5.43  R:  5.43  RT:  3.35 (  30Kops/s)
>
> MIPS DSPASE:
> add_8_8_8 =  L1:   8.81  L2:   7.67  M:  7.53 ( 17.11%)  HT:  4.62  VT:  4.68  R:  4.50  RT:  1.97 (  19Kops/s)
> add_8888_1555 =  L1:   9.07  L2:   7.27  M:  7.29 ( 24.87%)  HT:  5.09  VT:  4.95  R:  4.93  RT:  2.50 (  23Kops/s)
> outrev_n_8_8888 =  L1:   8.48  L2:   6.82  M:  6.88 ( 23.45%)  HT:  5.04  VT:  4.90  R:  4.85  RT:  2.48 (  23Kops/s)
> over_8888_n_0565 =  L1:   5.13  L2:   4.38  M:  4.16 ( 14.24%)  HT:  3.41  VT:  3.30  R:  3.34  RT:  1.93 (  19Kops/s)
> over_n_8888 =  L1:  18.58  L2:  12.91  M: 13.12 ( 29.85%)  HT:  9.75  VT:  9.06  R:  9.10  RT:  4.55 (  33Kops/s)
>
> _______________________________________________
> Pixman mailing list
> Pixman at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pixman