[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Xu, Samuel samuel.xu at intel.com
Tue Aug 17 00:52:43 PDT 2010


Hi Siarhei Siamashka:
We'd like to provide a new patch with following enhancement soon:
1) Add 64 bit asm code specifically for 64 bit, which will co-exist with 32 bit version
2) CPUID dynamic check in pixman-cpu.c and pixman-access.c
3) Makefile fixing
4) Stack executable fixing in both 32/64 asm files.
5) follow your comments to make asm code shorter
Any missing?

Two more questions:
----"Maybe the intention was good, but the end result just fails to build on my 64-bit Intel Atom netbook. And pixman "make check" dies with "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a clear indication that you did something wrong and the patch is unacceptable as is."

Q1: Could you past the 64-bit ATOM build failure? Is it caused by ./configure to generate make file, or build error, or link error?
Q2: Could you show us how to do "make check"?

Samuel


-----Original Message-----
From: Ma, Ling 
Sent: Tuesday, August 17, 2010 3:22 PM
To: Siarhei Siamashka; Xu, Samuel
Cc: pixman at lists.freedesktop.org; Liu, Xinyun
Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Hi Siarhei Siamashka

> I did not question the fact that PALIGNR instruction must be useful 
> for something. There must have been some reason why it was introduced 
> :) But real benchmark numbers are always interesting, like the ones 
> you provided for
> SSSE3 against original C code comparison. Just to be sure that these 
> more than 1000 lines of code are really justified (at least for this particular use).
We attached data report about movups + movaps Vs movaps + palignr. 

The report shows movaps + palignr is better because crossing-cacheline will cause big latency on Atom, Core2.

> As I already mentioned, the code in 'pixman-access-ssse3.S' is highly 
> redundant just for 'fetch_scanline_x8r8g8b8' implementation. Due to 
> processing 32-bit data, there are only 4 possible cases of relative 
> source/destination alignment, but the whole set of 16 alignment cases 
> is implemented. In addition to other benefits (like improved sources 
> readability), reducing code size is good for saving space in the CPU 
> instruction cache and this typically improves performance.

Original patch can tolerate any type of offset from src and dest. 
Yes, I agree if offset of src and dest must be 4 bytes aligned, Your suggestion will reduce code size significantly.

Other questions, Samuel or Xinyun will provide further answers.

Best Regards
Ling


> > 2) For #ifdefs vs. dynamically checking SSSE3, the patch exactly 
> > mimics "SSE2" support in currently pixman code(#ifdefs), which 
> > checks the build system when makefile generating and applies
> > USE_SSSE3 into later building.
> 
> Maybe the intention was good, but the end result just fails to build 
> on my 64-bit Intel Atom netbook. And pixman "make check" dies with 
> "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a 
> clear indication that you did something wrong and the patch is unacceptable as is.
> 
> > Of course prefix of CPUID can achieve better compatibility. It is ok 
> > for us to add this kind of CPUID prefix. I will appreciate directly 
> > comments on shape of this patch, using SSSE3, e.g. some code sample 
> > of expected pixman-cpu.c and
> pixman-access.c.
> 
> SSSE3 is not much different from MMX and SSE2 and can be added in a 
> similar way to pixman-cpu.c
> 
> Regarding SIMD optimized fetchers. Pixman has none at the moment, so 
> there is no code to be used as a reference to see how it should be 
> done "right" (and that's partially the reason why bug#20709 is still unresolved).
> There was an older discussion about SIMD fetchers:
> http://comments.gmane.org/gmane.comp.lib.cairo/18342
> 
> > 3) For executable stack caused by assemble file, it should be able 
> > to solved by adding ".section .note.GNU-stack .previous", according 
> > https://www.redhat.com/archives/fedora-devel-list/2005-March/msg00460.
> > html . In fact, same solution already exist in current 
> > pixman-arm-neon-asm.S inside pixman code.
> 
> Yes sure. I did not say that it can't be solved :) It's just better to 
> address this particular problem in the next revision of ssse3 patch.
> 
> --
> Best regards,
> Siarhei Siamashka


More information about the Pixman mailing list