[Pixman] [PATCH 0/3] Pixman MIPS DSPASE1
sandmann at cs.au.dk
Thu Feb 24 09:17:38 PST 2011
Thanks for picking up the MIPS work. There are some comments from last
time from Siarhei and myself that I don't think have been addressed. See
- In Siarhei's testing, the new over_n_8_8888() on MIPS32r2 was slower
than the C fast path. From
"One of the reasons for such a slowdown in gnome-system-monitor test is
that it uses 'over_n_8_8888' operation with the mask where 96.5% of
values are zero. And your MIPS32R2 optimized code does not handle
these special cases, always taking the slowest path ."
Ie., the way to make over_n_8_8888() fast is to skip compositing
whenever the mask is 0x00 or 0xff.
The same is likely also worthwhile even in the SIMD versions since
memory access is so expensive.
- The patch should be split such that one commit adds the MIPS32r2 part
and one adds the DSPASE part
- Coding style:
- Please use /* */ comments
- Indents are four spaces
- Put a space before parentheses
- Don't leave in commented-out code like this:
// b = _pixman_implementation_fill(imp->delegate,
bits, stride, bpp, x, y, width, height, xor);
And finally, while the lowlevel-blt benchmarks are convenient to use,
they are also synthetic, it is also important to test the performance
with real-world workloads such as those found in the cairo perf traces.
Veli-Matti Valtonen <veli-matti.valtonen at movial.com> writes:
> I started working on this optimizing for MIPS32R2 code originally (Based on the patch by Beloev), but the performance increases seem to be relatively similar to what over_n_8_8888 shows. The dspase is much more promising in this regard. It rather leaves me wondering if the mips32r2 should not be included.
> It might however be related to the test system, which has a MIPS 74K core. The original I assume was worked on with a MIPS 24K.
> I used pixman-arm-common.h for the assembler binding macros, which is the reason for the 'ARM' found in the glue.
> Compiling the code will result in the gcc producing Warnings about macro expansion, it'd be nice not to have these, but "fixing" them would have a (slight) negative effect readability.
> PATCH 1 is the original patch by Georgi Beloev, but modified to apply against pixman head.
> Scanline add, out reverse, over
> fast path:
> Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, MIPS16), Little Endian
> All the test program builds used CFLAGS="-O2 -mdsp -mips32r2"
> reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills)
> Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1
> over_n_8_8888 = L1: 6.16 L2: 5.34 M: 5.35 ( 19.24%) HT: 4.78 VT: 4.62 R: 4.55 RT: 2.99 ( 28Kops/s)
> add_8888_8888 = L1: 18.11 L2: 10.15 M: 9.98 ( 45.33%) HT: 14.80 VT: 13.36 R: 13.41 RT: 6.17 ( 46Kops/s)
> add_n_8888 = L1: 14.26 L2: 10.30 M: 10.38 ( 23.59%) HT: 8.05 VT: 7.64 R: 7.63 RT: 4.05 ( 33Kops/s)
> MIPS32R2: --disable-mips-dspase1
> over_n_8_8888 = L1: 6.17 L2: 5.62 M: 5.56 ( 20.33%) HT: 5.00 VT: 4.83 R: 4.76 RT: 3.33 ( 30Kops/s)
> MIPS DSPASE:
> over_n_8_8888 = L1: 9.76 L2: 7.89 M: 7.93 ( 27.11%) HT: 7.04 VT: 6.84 R: 6.63 RT: 4.06 ( 34Kops/s)
> add_8888_8888 = L1: 117.36 L2: 20.67 M: 23.22 (105.50%) HT: 17.40 VT: 15.96 R: 13.81 RT: 6.48 ( 47Kops/s)
> add_n_8888 = L1: 145.84 L2: 28.23 M: 31.11 ( 70.66%) HT: 22.95 VT: 18.54 R: 19.99 RT: 8.93 ( 50Kops/s)
> Scanline ops benchmarked using low-level-blit:
> I selected these ops by adding a printf to the scanline ops, and finding one that triggers it, if there is a more convenient way to benchmark these ops, I failed to find it.
> Optimizations disabled:
> add_8_8_8 = L1: 3.31 L2: 5.25 M: 5.16 ( 11.73%) HT: 3.61 VT: 3.60 R: 3.53 RT: 1.77 ( 18Kops/s)
> add_8888_1555 = L1: 6.51 L2: 5.32 M: 5.34 ( 18.20%) HT: 4.05 VT: 3.96 R: 3.94 RT: 2.21 ( 22Kops/s)
> outrev_n_8_8888 = L1: 6.33 L2: 5.25 M: 5.16 ( 17.60%) HT: 4.11 VT: 4.02 R: 3.97 RT: 2.23 ( 22Kops/s)
> over_8888_n_0565 = L1: 2.83 L2: 3.33 M: 3.21 ( 11.54%) HT: 2.73 VT: 2.69 R: 2.68 RT: 1.67 ( 17Kops/s)
> over_n_8888 = L1: 7.45 L2: 6.65 M: 6.66 ( 15.14%) HT: 5.65 VT: 5.43 R: 5.43 RT: 3.35 ( 30Kops/s)
> MIPS DSPASE:
> add_8_8_8 = L1: 8.81 L2: 7.67 M: 7.53 ( 17.11%) HT: 4.62 VT: 4.68 R: 4.50 RT: 1.97 ( 19Kops/s)
> add_8888_1555 = L1: 9.07 L2: 7.27 M: 7.29 ( 24.87%) HT: 5.09 VT: 4.95 R: 4.93 RT: 2.50 ( 23Kops/s)
> outrev_n_8_8888 = L1: 8.48 L2: 6.82 M: 6.88 ( 23.45%) HT: 5.04 VT: 4.90 R: 4.85 RT: 2.48 ( 23Kops/s)
> over_8888_n_0565 = L1: 5.13 L2: 4.38 M: 4.16 ( 14.24%) HT: 3.41 VT: 3.30 R: 3.34 RT: 1.93 ( 19Kops/s)
> over_n_8888 = L1: 18.58 L2: 12.91 M: 13.12 ( 29.85%) HT: 9.75 VT: 9.06 R: 9.10 RT: 4.55 ( 33Kops/s)
> Pixman mailing list
> Pixman at lists.freedesktop.org
More information about the Pixman