[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
samuel.xu at intel.com
Tue Aug 17 00:52:43 PDT 2010
Hi Siarhei Siamashka:
We'd like to provide a new patch with following enhancement soon:
1) Add 64 bit asm code specifically for 64 bit, which will co-exist with 32 bit version
2) CPUID dynamic check in pixman-cpu.c and pixman-access.c
3) Makefile fixing
4) Stack executable fixing in both 32/64 asm files.
5) follow your comments to make asm code shorter
Two more questions:
----"Maybe the intention was good, but the end result just fails to build on my 64-bit Intel Atom netbook. And pixman "make check" dies with "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a clear indication that you did something wrong and the patch is unacceptable as is."
Q1: Could you past the 64-bit ATOM build failure? Is it caused by ./configure to generate make file, or build error, or link error?
Q2: Could you show us how to do "make check"?
From: Ma, Ling
Sent: Tuesday, August 17, 2010 3:22 PM
To: Siarhei Siamashka; Xu, Samuel
Cc: pixman at lists.freedesktop.org; Liu, Xinyun
Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Hi Siarhei Siamashka
> I did not question the fact that PALIGNR instruction must be useful
> for something. There must have been some reason why it was introduced
> :) But real benchmark numbers are always interesting, like the ones
> you provided for
> SSSE3 against original C code comparison. Just to be sure that these
> more than 1000 lines of code are really justified (at least for this particular use).
We attached data report about movups + movaps Vs movaps + palignr.
The report shows movaps + palignr is better because crossing-cacheline will cause big latency on Atom, Core2.
> As I already mentioned, the code in 'pixman-access-ssse3.S' is highly
> redundant just for 'fetch_scanline_x8r8g8b8' implementation. Due to
> processing 32-bit data, there are only 4 possible cases of relative
> source/destination alignment, but the whole set of 16 alignment cases
> is implemented. In addition to other benefits (like improved sources
> readability), reducing code size is good for saving space in the CPU
> instruction cache and this typically improves performance.
Original patch can tolerate any type of offset from src and dest.
Yes, I agree if offset of src and dest must be 4 bytes aligned, Your suggestion will reduce code size significantly.
Other questions, Samuel or Xinyun will provide further answers.
> > 2) For #ifdefs vs. dynamically checking SSSE3, the patch exactly
> > mimics "SSE2" support in currently pixman code(#ifdefs), which
> > checks the build system when makefile generating and applies
> > USE_SSSE3 into later building.
> Maybe the intention was good, but the end result just fails to build
> on my 64-bit Intel Atom netbook. And pixman "make check" dies with
> "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a
> clear indication that you did something wrong and the patch is unacceptable as is.
> > Of course prefix of CPUID can achieve better compatibility. It is ok
> > for us to add this kind of CPUID prefix. I will appreciate directly
> > comments on shape of this patch, using SSSE3, e.g. some code sample
> > of expected pixman-cpu.c and
> SSSE3 is not much different from MMX and SSE2 and can be added in a
> similar way to pixman-cpu.c
> Regarding SIMD optimized fetchers. Pixman has none at the moment, so
> there is no code to be used as a reference to see how it should be
> done "right" (and that's partially the reason why bug#20709 is still unresolved).
> There was an older discussion about SIMD fetchers:
> > 3) For executable stack caused by assemble file, it should be able
> > to solved by adding ".section .note.GNU-stack .previous", according
> > https://www.redhat.com/archives/fedora-devel-list/2005-March/msg00460.
> > html . In fact, same solution already exist in current
> > pixman-arm-neon-asm.S inside pixman code.
> Yes sure. I did not say that it can't be solved :) It's just better to
> address this particular problem in the next revision of ssse3 patch.
> Best regards,
> Siarhei Siamashka
More information about the Pixman