[Pixman] [PATCH 0/3] Pixman MIPS DSPASE1

Tue Feb 22 01:05:54 PST 2011

I started working on this optimizing for MIPS32R2 code originally (Based on the patch by Beloev), but the performance increases seem to be relatively similar to what over_n_8_8888 shows. The dspase is much more promising in this regard. It rather leaves me wondering if the mips32r2 should not be included.

It might however be related to the test system, which has a MIPS 74K core. The original I assume was worked on with a MIPS 24K.

I used pixman-arm-common.h for the assembler binding macros, which is the reason for the 'ARM' found in the glue.

Compiling the code will result in the gcc producing Warnings about macro expansion, it'd be nice not to have these, but "fixing" them would have a (slight) negative effect readability.

PATCH 1 is the original patch by Georgi Beloev, but modified to apply against pixman head.

Implemented:
Scanline add, out reverse, over
fast path:
over_n_8_8888
add_8888_8888
add_n_8888

Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, MIPS16), Little Endian

All the test program builds used CFLAGS="-O2 -mdsp -mips32r2"

reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills)

Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1
over_n_8_8888 =  L1:   6.16  L2:   5.34  M:  5.35 ( 19.24%)  HT:  4.78  VT:  4.62  R:  4.55  RT:  2.99 (  28Kops/s)
add_8888_8888 =  L1:  18.11  L2:  10.15  M:  9.98 ( 45.33%)  HT: 14.80  VT: 13.36  R: 13.41  RT:  6.17 (  46Kops/s)
add_n_8888 =  L1:  14.26  L2:  10.30  M: 10.38 ( 23.59%)  HT:  8.05  VT:  7.64  R:  7.63  RT:  4.05 (  33Kops/s)

MIPS32R2: --disable-mips-dspase1
over_n_8_8888 =  L1:   6.17  L2:   5.62  M:  5.56 ( 20.33%)  HT:  5.00  VT:  4.83  R:  4.76  RT:  3.33 (  30Kops/s)

MIPS DSPASE:
over_n_8_8888 =  L1:   9.76  L2:   7.89  M:  7.93 ( 27.11%)  HT:  7.04  VT:  6.84  R:  6.63  RT:  4.06 (  34Kops/s)
add_8888_8888 =  L1: 117.36  L2:  20.67  M: 23.22 (105.50%)  HT: 17.40  VT: 15.96  R: 13.81  RT:  6.48 (  47Kops/s)
add_n_8888 =  L1: 145.84  L2:  28.23  M: 31.11 ( 70.66%)  HT: 22.95  VT: 18.54  R: 19.99  RT:  8.93 (  50Kops/s)

Scanline ops benchmarked using low-level-blit:

I selected these ops by adding a printf to the scanline ops, and finding one that triggers it, if there is a more convenient way to benchmark these ops, I failed to find it.

Optimizations disabled:
add_8_8_8 =  L1:   3.31  L2:   5.25  M:  5.16 ( 11.73%)  HT:  3.61  VT:  3.60  R:  3.53  RT:  1.77 (  18Kops/s)
add_8888_1555 =  L1:   6.51  L2:   5.32  M:  5.34 ( 18.20%)  HT:  4.05  VT:  3.96  R:  3.94  RT:  2.21 (  22Kops/s)
outrev_n_8_8888 =  L1:   6.33  L2:   5.25  M:  5.16 ( 17.60%)  HT:  4.11  VT:  4.02  R:  3.97  RT:  2.23 (  22Kops/s)
over_8888_n_0565 =  L1:   2.83  L2:   3.33  M:  3.21 ( 11.54%)  HT:  2.73  VT:  2.69  R:  2.68  RT:  1.67 (  17Kops/s)
over_n_8888 =  L1:   7.45  L2:   6.65  M:  6.66 ( 15.14%)  HT:  5.65  VT:  5.43  R:  5.43  RT:  3.35 (  30Kops/s)

MIPS DSPASE:
add_8_8_8 =  L1:   8.81  L2:   7.67  M:  7.53 ( 17.11%)  HT:  4.62  VT:  4.68  R:  4.50  RT:  1.97 (  19Kops/s)
add_8888_1555 =  L1:   9.07  L2:   7.27  M:  7.29 ( 24.87%)  HT:  5.09  VT:  4.95  R:  4.93  RT:  2.50 (  23Kops/s)
outrev_n_8_8888 =  L1:   8.48  L2:   6.82  M:  6.88 ( 23.45%)  HT:  5.04  VT:  4.90  R:  4.85  RT:  2.48 (  23Kops/s)
over_8888_n_0565 =  L1:   5.13  L2:   4.38  M:  4.16 ( 14.24%)  HT:  3.41  VT:  3.30  R:  3.34  RT:  1.93 (  19Kops/s)
over_n_8888 =  L1:  18.58  L2:  12.91  M: 13.12 ( 29.85%)  HT:  9.75  VT:  9.06  R:  9.10  RT:  4.55 (  33Kops/s)