[Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=
Siarhei Siamashka
siarhei.siamashka at gmail.com
Tue Mar 16 04:25:11 PDT 2010
On Sunday 14 March 2010, Loïc Minier wrote:
> On Wed, Mar 10, 2010, Siarhei Siamashka wrote:
> > I would prefer a bit more descriptive comment (with the details copied
> > from that launchpad page).
>
> I see you pushed this now; thanks! Yeah, it's not obvious why one
> needs to try with the toolchain defaults first. I'm attaching a patch
> to update the comments.
Well, it's actually obvious enough and current comments in configure.ac seem
to be sufficient. I was only nitpicking about the commit summary and comment.
Thanks anyway.
> > In my opinion the best solution overall would be to move all the assembly
> > optimizations into separate .S files also for legacy ARM processors and
> > get rid of these compiler option hacks. I think that bringing support for
> > legacy ARM processors into a better shape is quite realistic even for
> > 0.18.0 stable release, which is due to be released this month. But it can
> > only happen if enough people are interested in this, and more
> > importantly, are ready to actively participate in testing.
>
> I think this should be kept as an open bug against pixman that the
> inline asm()s would better be written as separate .S files (thanks for
> the idea).
For ARM NEON optimizations in pixman 0.16.x it was even more messy and ugly
because of '-mfloat-abi' option. Switching to .S files solves this problem
and automatically gives more control over registers allocation. Inline
assembly is more fragile in this respect because if it tries to use as many
registers as possible (availability of more registers is better for
optimization), gcc may fail to compile the code depending on the optimizations
level and other options, giving a rather annoying error: "can't find a
register in class 'GENERAL_REGS' while reloading 'asm'"
The downside of using assembly directly is the need to care about ABI, stack
alignment, dealing with r9 register, etc. It is not a big problem to target
ARM EABI in linux. But the other platforms running on ARM (windows,
apple, ...) may potentially have troubles or have assembly optimizations
disabled.
That's why I'm hesitating to touch support for older ARM processors.
Another issue is whether to use or not to use unaligned memory accesses on
armv6+ systems. Currently even in NEON code from 'pixman-arm-neon-asm.S', a
configuration variable RESPECT_STRICT_ALIGNMENT is set to 1 and pixman should
never use unaligned memory accesses. While setting this option to 0 would give
a bit better performance when dealing with leading and trailing pixels in each
scanline.
I created a branch to collect some older ARMv6 optimization for pixman
(coincidentally they already use "naked" functions, which is practically
equivalent to implementing code in external .S files):
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=arm-optimizations-from-xomap
Cleaning up this branch and splitting out armv4 assembly (which should also
provide a good performance improvement) may be a good idea for really old ARM
systems. As part of this activity, an old bug/feature request can be solved
too: https://bugs.freedesktop.org/show_bug.cgi?id=13445
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list