[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Mon Aug 16 16:00:00 PDT 2010

On Monday 16 August 2010 11:24:44 Xu, Samuel wrote:
> 	Thanks for kindly comments! It is very nice that bug#20709 also 
emphasize
> similar performance issue.

Well, having bugs unresolved for such a long time is not so nice. Also see some 
more comments below.

> So, let's discuss how to make this patch
> better. Here is some answers from me, and Ma Ling will update comments on
> 32/64 bit question and assemble details :
> 
> 1) For MOVAPS+PALIGNR from SSSE3, referring chapter 12 of <Intel(r) 64 and
> IA-32 Architectures Optimization Reference Manual> at
> http://www.intel.com/Assets/PDF/manual/248966.pdf, we can know
> MOVAPS+PALIGNR is preferred in ATOM: "Assembly /Compiler Coding Rule 13.
> (MH impact, M generality) For Intel Atom processors, prefer a sequence
> MOVAPS+PALIGN over MOVUPS. Similarly, MOVDQA+PALIGNR is preferred over
> MOVDQU." As far as I know, this code pattern is friendly for ATOM and
> won't harm Core-i7.

I did not question the fact that PALIGNR instruction must be useful for 
something. There must have been some reason why it was introduced :) But real 
benchmark numbers are always interesting, like the ones you provided for SSSE3 
against original C code comparison. Just to be sure that these more than 1000 
lines of code are really justified (at least for this particular use).

As I already mentioned, the code in 'pixman-access-ssse3.S' is highly redundant 
just for 'fetch_scanline_x8r8g8b8' implementation. Due to processing
32-bit data, there are only 4 possible cases of relative source/destination 
alignment, but the whole set of 16 alignment cases is implemented. In addition 
to other benefits (like improved sources readability), reducing code size is 
good for saving space in the CPU instruction cache and this typically improves 
performance.

> 2) For #ifdefs vs. dynamically checking SSSE3, the
> patch exactly mimics "SSE2" support in currently pixman code(#ifdefs),
> which checks the build system when makefile generating and applies
> USE_SSSE3 into later building.

Maybe the intention was good, but the end result just fails to build on my
64-bit Intel Atom netbook. And pixman "make check" dies with "Illegal 
instruction" on a 32-bit Intel Pentium-M laptop. So it's a clear indication 
that you did something wrong and the patch is unacceptable as is.

> Of course prefix of CPUID can achieve
> better compatibility. It is ok for us to add this kind of CPUID prefix. I
> will appreciate directly comments on shape of this patch, using SSSE3,
> e.g. some code sample of expected pixman-cpu.c and pixman-access.c.

SSSE3 is not much different from MMX and SSE2 and can be added in a similar way 
to pixman-cpu.c

Regarding SIMD optimized fetchers. Pixman has none at the moment, so there is 
no code to be used as a reference to see how it should be done "right" (and 
that's partially the reason why bug#20709 is still unresolved).
There was an older discussion about SIMD fetchers: 
http://comments.gmane.org/gmane.comp.lib.cairo/18342 

> 3) For executable stack caused by assemble file, it should be able to solved
> by adding ".section .note.GNU-stack .previous", according
> https://www.redhat.com/archives/fedora-devel-list/2005-March/msg00460.html
> . In fact, same solution already exist in current pixman-arm-neon-asm.S
> inside pixman code.

Yes sure. I did not say that it can't be solved :) It's just better to address 
this particular problem in the next revision of ssse3 patch.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100817/b4101d11/attachment.pgp>