[Pixman] [PATCH 0/3] Pixman MIPS DSPASE1
Soeren Sandmann
sandmann at cs.au.dk
Thu Feb 24 09:17:38 PST 2011
Hi,
Thanks for picking up the MIPS work. There are some comments from last
time from Siarhei and myself that I don't think have been addressed. See
these mails:
http://lists.freedesktop.org/archives/pixman/2010-December/000773.html
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html
- In Siarhei's testing, the new over_n_8_8888() on MIPS32r2 was slower
than the C fast path. From
http://lists.freedesktop.org/archives/pixman/2010-December/000773.html :
"One of the reasons for such a slowdown in gnome-system-monitor test is
that it uses 'over_n_8_8888' operation with the mask where 96.5% of
values are zero. And your MIPS32R2 optimized code does not handle
these special cases, always taking the slowest path [1]."
Ie., the way to make over_n_8_8888() fast is to skip compositing
whenever the mask is 0x00 or 0xff.
The same is likely also worthwhile even in the SIMD versions since
memory access is so expensive.
From
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html :
- The patch should be split such that one commit adds the MIPS32r2 part
and one adds the DSPASE part
- Coding style:
- Please use /* */ comments
- Indents are four spaces
- Put a space before parentheses
- Don't leave in commented-out code like this:
// b = _pixman_implementation_fill(imp->delegate,
bits, stride, bpp, x, y, width, height, xor);
And finally, while the lowlevel-blt benchmarks are convenient to use,
they are also synthetic, it is also important to test the performance
with real-world workloads such as those found in the cairo perf traces.
Thanks,
Soren
Veli-Matti Valtonen <veli-matti.valtonen at movial.com> writes:
> I started working on this optimizing for MIPS32R2 code originally (Based on the patch by Beloev), but the performance increases seem to be relatively similar to what over_n_8_8888 shows. The dspase is much more promising in this regard. It rather leaves me wondering if the mips32r2 should not be included.
>
> It might however be related to the test system, which has a MIPS 74K core. The original I assume was worked on with a MIPS 24K.
>
> I used pixman-arm-common.h for the assembler binding macros, which is the reason for the 'ARM' found in the glue.
>
> Compiling the code will result in the gcc producing Warnings about macro expansion, it'd be nice not to have these, but "fixing" them would have a (slight) negative effect readability.
>
> PATCH 1 is the original patch by Georgi Beloev, but modified to apply against pixman head.
>
> Implemented:
> Scanline add, out reverse, over
> fast path:
> over_n_8_8888
> add_8888_8888
> add_n_8888
>
> Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, MIPS16), Little Endian
>
> All the test program builds used CFLAGS="-O2 -mdsp -mips32r2"
>
> reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills)
>
> Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1
> over_n_8_8888 = L1: 6.16 L2: 5.34 M: 5.35 ( 19.24%) HT: 4.78 VT: 4.62 R: 4.55 RT: 2.99 ( 28Kops/s)
> add_8888_8888 = L1: 18.11 L2: 10.15 M: 9.98 ( 45.33%) HT: 14.80 VT: 13.36 R: 13.41 RT: 6.17 ( 46Kops/s)
> add_n_8888 = L1: 14.26 L2: 10.30 M: 10.38 ( 23.59%) HT: 8.05 VT: 7.64 R: 7.63 RT: 4.05 ( 33Kops/s)
>
> MIPS32R2: --disable-mips-dspase1
> over_n_8_8888 = L1: 6.17 L2: 5.62 M: 5.56 ( 20.33%) HT: 5.00 VT: 4.83 R: 4.76 RT: 3.33 ( 30Kops/s)
>
> MIPS DSPASE:
> over_n_8_8888 = L1: 9.76 L2: 7.89 M: 7.93 ( 27.11%) HT: 7.04 VT: 6.84 R: 6.63 RT: 4.06 ( 34Kops/s)
> add_8888_8888 = L1: 117.36 L2: 20.67 M: 23.22 (105.50%) HT: 17.40 VT: 15.96 R: 13.81 RT: 6.48 ( 47Kops/s)
> add_n_8888 = L1: 145.84 L2: 28.23 M: 31.11 ( 70.66%) HT: 22.95 VT: 18.54 R: 19.99 RT: 8.93 ( 50Kops/s)
>
> Scanline ops benchmarked using low-level-blit:
>
> I selected these ops by adding a printf to the scanline ops, and finding one that triggers it, if there is a more convenient way to benchmark these ops, I failed to find it.
>
> Optimizations disabled:
> add_8_8_8 = L1: 3.31 L2: 5.25 M: 5.16 ( 11.73%) HT: 3.61 VT: 3.60 R: 3.53 RT: 1.77 ( 18Kops/s)
> add_8888_1555 = L1: 6.51 L2: 5.32 M: 5.34 ( 18.20%) HT: 4.05 VT: 3.96 R: 3.94 RT: 2.21 ( 22Kops/s)
> outrev_n_8_8888 = L1: 6.33 L2: 5.25 M: 5.16 ( 17.60%) HT: 4.11 VT: 4.02 R: 3.97 RT: 2.23 ( 22Kops/s)
> over_8888_n_0565 = L1: 2.83 L2: 3.33 M: 3.21 ( 11.54%) HT: 2.73 VT: 2.69 R: 2.68 RT: 1.67 ( 17Kops/s)
> over_n_8888 = L1: 7.45 L2: 6.65 M: 6.66 ( 15.14%) HT: 5.65 VT: 5.43 R: 5.43 RT: 3.35 ( 30Kops/s)
>
> MIPS DSPASE:
> add_8_8_8 = L1: 8.81 L2: 7.67 M: 7.53 ( 17.11%) HT: 4.62 VT: 4.68 R: 4.50 RT: 1.97 ( 19Kops/s)
> add_8888_1555 = L1: 9.07 L2: 7.27 M: 7.29 ( 24.87%) HT: 5.09 VT: 4.95 R: 4.93 RT: 2.50 ( 23Kops/s)
> outrev_n_8_8888 = L1: 8.48 L2: 6.82 M: 6.88 ( 23.45%) HT: 5.04 VT: 4.90 R: 4.85 RT: 2.48 ( 23Kops/s)
> over_8888_n_0565 = L1: 5.13 L2: 4.38 M: 4.16 ( 14.24%) HT: 3.41 VT: 3.30 R: 3.34 RT: 1.93 ( 19Kops/s)
> over_n_8888 = L1: 18.58 L2: 12.91 M: 13.12 ( 29.85%) HT: 9.75 VT: 9.06 R: 9.10 RT: 4.55 ( 33Kops/s)
>
> _______________________________________________
> Pixman mailing list
> Pixman at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pixman
More information about the Pixman
mailing list