[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Tue Aug 17 00:21:43 PDT 2010

Hi Siarhei Siamashka

> I did not question the fact that PALIGNR instruction must be useful for
> something. There must have been some reason why it was introduced :) But
> real benchmark numbers are always interesting, like the ones you provided for
> SSSE3 against original C code comparison. Just to be sure that these more than
> 1000 lines of code are really justified (at least for this particular use).
We attached data report about movups + movaps Vs movaps + palignr. 

The report shows movaps + palignr is better because crossing-cacheline will
cause big latency on Atom, Core2.

> As I already mentioned, the code in 'pixman-access-ssse3.S' is highly redundant
> just for 'fetch_scanline_x8r8g8b8' implementation. Due to processing 32-bit
> data, there are only 4 possible cases of relative source/destination alignment,
> but the whole set of 16 alignment cases is implemented. In addition to other
> benefits (like improved sources readability), reducing code size is good for
> saving space in the CPU instruction cache and this typically improves
> performance.

Original patch can tolerate any type of offset from src and dest. 
Yes, I agree if offset of src and dest must be 4 bytes aligned, 
Your suggestion will reduce code size significantly.

Other questions, Samuel or Xinyun will provide further answers.

Best Regards
Ling

> > 2) For #ifdefs vs. dynamically checking SSSE3, the patch exactly
> > mimics "SSE2" support in currently pixman code(#ifdefs), which checks
> > the build system when makefile generating and applies
> > USE_SSSE3 into later building.
> 
> Maybe the intention was good, but the end result just fails to build on my 64-bit
> Intel Atom netbook. And pixman "make check" dies with "Illegal instruction" on
> a 32-bit Intel Pentium-M laptop. So it's a clear indication that you did something
> wrong and the patch is unacceptable as is.
> 
> > Of course prefix of CPUID can achieve
> > better compatibility. It is ok for us to add this kind of CPUID
> > prefix. I will appreciate directly comments on shape of this patch,
> > using SSSE3, e.g. some code sample of expected pixman-cpu.c and
> pixman-access.c.
> 
> SSSE3 is not much different from MMX and SSE2 and can be added in a similar
> way to pixman-cpu.c
> 
> Regarding SIMD optimized fetchers. Pixman has none at the moment, so there
> is no code to be used as a reference to see how it should be done "right" (and
> that's partially the reason why bug#20709 is still unresolved).
> There was an older discussion about SIMD fetchers:
> http://comments.gmane.org/gmane.comp.lib.cairo/18342
> 
> > 3) For executable stack caused by assemble file, it should be able to
> > solved by adding ".section .note.GNU-stack .previous", according
> > https://www.redhat.com/archives/fedora-devel-list/2005-March/msg00460.
> > html . In fact, same solution already exist in current
> > pixman-arm-neon-asm.S inside pixman code.
> 
> Yes sure. I did not say that it can't be solved :) It's just better to address this
> particular problem in the next revision of ssse3 patch.
> 
> --
> Best regards,
> Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memcpy-pixman.xls
Type: application/vnd.ms-excel
Size: 77824 bytes
Desc: memcpy-pixman.xls
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100817/d91bb5c7/attachment-0001.xls>