[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
siarhei.siamashka at gmail.com
Mon Aug 16 16:00:00 PDT 2010
On Monday 16 August 2010 11:24:44 Xu, Samuel wrote:
> Thanks for kindly comments! It is very nice that bug#20709 also
> similar performance issue.
Well, having bugs unresolved for such a long time is not so nice. Also see some
more comments below.
> So, let's discuss how to make this patch
> better. Here is some answers from me, and Ma Ling will update comments on
> 32/64 bit question and assemble details :
> 1) For MOVAPS+PALIGNR from SSSE3, referring chapter 12 of <Intel(r) 64 and
> IA-32 Architectures Optimization Reference Manual> at
> http://www.intel.com/Assets/PDF/manual/248966.pdf, we can know
> MOVAPS+PALIGNR is preferred in ATOM: "Assembly /Compiler Coding Rule 13.
> (MH impact, M generality) For Intel Atom processors, prefer a sequence
> MOVAPS+PALIGN over MOVUPS. Similarly, MOVDQA+PALIGNR is preferred over
> MOVDQU." As far as I know, this code pattern is friendly for ATOM and
> won't harm Core-i7.
I did not question the fact that PALIGNR instruction must be useful for
something. There must have been some reason why it was introduced :) But real
benchmark numbers are always interesting, like the ones you provided for SSSE3
against original C code comparison. Just to be sure that these more than 1000
lines of code are really justified (at least for this particular use).
As I already mentioned, the code in 'pixman-access-ssse3.S' is highly redundant
just for 'fetch_scanline_x8r8g8b8' implementation. Due to processing
32-bit data, there are only 4 possible cases of relative source/destination
alignment, but the whole set of 16 alignment cases is implemented. In addition
to other benefits (like improved sources readability), reducing code size is
good for saving space in the CPU instruction cache and this typically improves
> 2) For #ifdefs vs. dynamically checking SSSE3, the
> patch exactly mimics "SSE2" support in currently pixman code(#ifdefs),
> which checks the build system when makefile generating and applies
> USE_SSSE3 into later building.
Maybe the intention was good, but the end result just fails to build on my
64-bit Intel Atom netbook. And pixman "make check" dies with "Illegal
instruction" on a 32-bit Intel Pentium-M laptop. So it's a clear indication
that you did something wrong and the patch is unacceptable as is.
> Of course prefix of CPUID can achieve
> better compatibility. It is ok for us to add this kind of CPUID prefix. I
> will appreciate directly comments on shape of this patch, using SSSE3,
> e.g. some code sample of expected pixman-cpu.c and pixman-access.c.
SSSE3 is not much different from MMX and SSE2 and can be added in a similar way
Regarding SIMD optimized fetchers. Pixman has none at the moment, so there is
no code to be used as a reference to see how it should be done "right" (and
that's partially the reason why bug#20709 is still unresolved).
There was an older discussion about SIMD fetchers:
> 3) For executable stack caused by assemble file, it should be able to solved
> by adding ".section .note.GNU-stack .previous", according
> . In fact, same solution already exist in current pixman-arm-neon-asm.S
> inside pixman code.
Yes sure. I did not say that it can't be solved :) It's just better to address
this particular problem in the next revision of ssse3 patch.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: This is a digitally signed message part.
More information about the Pixman