[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Xu, Samuel
samuel.xu at intel.com
Sun Aug 29 02:25:47 PDT 2010
Hi, Siarhei Siamashka:
Q:---" What problems do you have without "merge" mechanism?"
A: Of course there isn't correctness issue w/o "merge".
Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each other, although some checking forms delegate chain. While after delegate chain formed, only one fast table effect. Is my understanding correct?
So after ssse3_fast_paths is newly added, only SSSE3 table will be effect after SSSE3 CPUID is detected. Of course we can just keep 2 new entries with SSSE3 asm optimizations, the issue is: we lost the optimizations which already exist in current SSE2 table. So, w/o merging or w/o totally duplication from SSE2 file, there will be performance unfriendly.
Sure, It is ok for us for this "correctness firstly, then performance in next wave" philosophy. A new SSSE3 file with only 2 fast path entries is ok?
For simplifying on CPU detection, I think same " correctness firstly, then performance in next wave" philosophy might be followed. A full CPU detection for 64 bit can be added firstly as a baseline, the one who can test win32 and Solaris can help us to make it shorter.
As a wrap, new path will:
1) keep most 64 bit CPU detection as currently patch (GNU C part can reduce some edx checking)
2) A new SSSE3 file with only 2 fast path entries for newly added ASM optimization
Comments?
Samuel
-----Original Message-----
From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com]
Sent: Friday, August 27, 2010 10:57 PM
To: Xu, Samuel
Cc: pixman at lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Thanks for quick response!
> For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER,
> it is ok for us to simplify it as your example in next update.
If you can ensure MSVC compatibility and make it work with your optimizations, then it would be really great. But if it is totally untested, I don't feel comfortable about having it just blindly replicated from 32 to 64 bits with the hope that it will work.
It's just my opinion, the others may disagree. And the others may also try to test your patch on win32 or solaris systems, providing a lot more useful feedback than me.
> For pixman-ssse3.c, maybe we have 2 options:
> 1) duplicate 6562 lines from pixman-sse2.c to new pixman-
> ssse3.c in 1st patch (of course to replace 2 entries with newly added
> SSSE3 asm optimization), and then add "merge" mechanism in later patch.
No, there is no need to duplicate anything.
> 2) firstly add "merge" mechanism patch, and the added new pixman-ssse3.c in
> later patch, which might be shorter (111 lines) Does it mean
> 1) option is preferred?
What problems do you have without "merge" mechanism? The pixman-sse2.c works fine without it, and it does properly fallback to MMX code if SSE2 does not support some operations. Similarly, SSSE3 can fallback to SSE2 in the very same way.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list