[Pixman] ARM iwmmxt patches

Matt Turner mattst88 at gmail.com
Thu Aug 25 10:42:54 PDT 2011

On Wed, Jul 27, 2011 at 1:03 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Wed, Jul 27, 2011 at 12:52 PM, Soeren Sandmann <sandmann at cs.au.dk> wrote:
>> Matt Turner <mattst88 at gmail.com> writes:
>>> The 3 patch series adds support for compiling pixman's pixman-mmx.c
>>> for ARM/iwmmxt for some performance improvements on iwmmxt-enabled ARM
>>> CPUs. This is done by taking advantage of the fact that gcc provides
>>> MMX-compatible _mm_*-style intrinsics for iwmmxt on ARM.
>>> On my OLPC XO 1.75 (with a Marvell CPU), they pass the pixman test
>>> suite (verified that test suite passes on x86/MMX as well) and improve
>>> performance of most cairo-traces 7% or more. (See attached)
>>> For lowlevel-blit-bench, iwmmxt paths are not always faster, at times
>>> losing to ARMv6 or geneic paths (but even ARMv6 is sometimes slower
>>> than generic...) but providing some massive speed-ups at times:
>> A few overall comments:
>> - It would make sense to rename USE_MMX to USE_X86_MMX for symmetry, and
>>  also adding a comment at the top of pixman-mmx.c to indicate that it
>>  is being used on both x86 and ARM.
> OK, I can do that.
>> - We need more details in the commit messages.
> Indeed. Will do.
>> Thanks for generating the detailed data. I have formatted it here:
>>  low-level-blit:   http://people.freedesktop.org/~sandmann/bench-data/all-llblit.txt
>>  traces:           http://people.freedesktop.org/~sandmann/bench-data/all-traces.txt
>> to more clearly show the differences between the various
>> implementations. As Siarhei already commented on, the most surprising
>> result is that the armv6 assembly is generally slower than the generic C
>> code, in some cases a lot slower.
>>> gcc's current support for iwmmxt code generation is atrocious (See gcc
>>> bugs 35294, 36798, 36966), so I have patched gcc to add missing shift
>>> and logical iwmmxt instructions. I have seen patches posted improving
>>> gcc's iwmmxt support, so I hope that gcc-4.7 will be able to use
>>> pixman's iwmmxt code without trouble. (Reminds me as I write this that
>>> I need to modify the configure.ac test to use instructions that cause
>>> current gcc to crash.)
>> Are you saying that current versions of GCC basically don't work with
>> iwmmxt? If so, we should probably just check for the GCC 4.7 in
>> configure.
> Yes, patches have been send to gcc-patches@ but I don't think they're
> in gcc-4.7 yet. gcc-4.6 and older, unless there have been some
> startling regressions, certainly cannot use basic shift and logical
> instruction intrinsics.
> I will modify the configure.ac hunk to check for gcc-4.7 and also
> modify the test code to use an intrinsic that is used in pixman-mmx.c
> and known to not work with gcc-4.6.1.
> Thanks,
> Matt

I've been trying to figure out if the ARM iwmmxt inline assembly makes
any difference at all. I think the conclusion is that it does not.
Updated code is here:

See http://people.freedesktop.org/~mattst88/pixman-iwmmxt-benchdata.txt

Never does using inline assembly seem to make any sort of meaningful
difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried
checking the alignment in the 'wip' commit in the blt function to
avoid a lot of unnecessary walign instructions, but as you can see
from the benchmark results, it doesn't help anything.

Should I just drop the inline assembly pieces? It would definitely
make the code simpler.


More information about the Pixman mailing list