[Pixman] [PATCH 0/8]: ARM/iwmmxt optimizations
mattst88 at gmail.com
Fri Sep 23 11:54:09 PDT 2011
Following this email is a series of eight patches which add
the ability to compile pixman's pixman-mmx.c on ARM in order
to use the iwMMXt SIMD instruction set. The purpose of this
work is to improve the compositing performance of the OLPC
Care has been taken to ensure that each commit passes the test
suite on both ARM/iwmmxt and x86/MMX. On x86/MMX, there should
be no changes at all.
The patches (and the three I sent yesterday and today) are in
the iwmmxt-optimizations5 branch of my FreeDesktop repo:
I benchmarked the generic, ARMv6 SIMD, iwmmxt paths, and for
iwmmxt, with fill inline assembly, with blit assembly, and with
both, and with neither.
Here are the summarized benchmark results from cairo-perf-trace:
| image | iwmmxt imp | blit imp | image16 | iwmmxt imp | blit imp |
evolution | 32.775 | 9% | | 29.931 | 12% | |
firefox-planet-gnome | 181.878 | 10% | 9% | 232.297 | 8% | 4% |
gnome-system-monitor | 59.029 | | | 59.126 | | |
gnome-terminal-vim | 53.702 | 10% | 12% | 56.900 | 5% | 8% |
grads-heat-map | 8.012 | | | 8.149 | | |
gvim | 39.232 | 5% | | 42.528 | 3% | |
midori-zoomed | 25.413 | | | 27.420 | 9% | |
poppler | 40.505 | 7% | 8% | 40.798 | 7% | 9% |
swfdec-giant-steps | 24.510 | 18% | | 23.778 | 20% | 6% |
xfce4-terminal-a1 | 54.433 | 17% | | 55.751 | 19% | |
(Full results are available at http://people.freedesktop.org/~mattst88/iwmmxt-benchmarks/)
- iwmmxt improved midori-zoomed's `image16' time by 9% but didn't
significantly change the time for `image'.
- iwmmxt/blit improved swfdec-giant-steps' `image16' time by 6% but
didn't significantly change the time for `image'.
- firefox-planet-gnome is the only benchmark whose times for `image'
and `image16' were significantly different: 181.878s vs 232.297s.
- armv6 is significantly slower than the generic paths for
- gnome-system-monitor (image, image16)
- swfdec-giant-steps (image)
Fill never helps or hurts. (Probably not work applying the fill patch)
Blit never hurts.
The only thing left to do that I know of is related to the blit inline
assembly: it currently doesn't check co-alignment of src and dst. The
kernel is configured to handle unaligned accesses on the XO 1.75, but
it's still something worth investigating and fixing. Perhaps there's a
bit more performance to squeeze out of the blit code.
I think gcc-4.7 will receive some seriously needed iwmmxt work, so I'm
hopeful that gcc-4.7 will improve these benchmark results further.
Please review and apply.
Thanks a lot for your help!
More information about the Pixman