[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
samuel.xu at intel.com
Mon Aug 30 02:09:53 PDT 2010
Hi, Siarhei Siamashka:
New patch, which contains
1)Simplified 64 bit detect_cpu_features(), only check SSSE3 bit. _MSC_VER path switched to __cpuid to avoid inline asm. _MSC_VER path CPUID code snatch is extracted and tested on standalone C file in a 64 bit windows system, using VS2010.
2) removed "merging", pixman-ssse3.c is shaped as our discussion.
Could you review it?
From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com]
Sent: Sunday, August 29, 2010 6:27 PM
To: Xu, Samuel
Cc: pixman at lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Q:---" What problems do you have without "merge" mechanism?"
> A: Of course there isn't correctness issue w/o "merge".
> Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded
> each other, although some checking forms delegate chain. While after
> delegate chain formed, only one fast table effect. Is my understanding
> correct? So after ssse3_fast_paths is newly added, only SSSE3 table
> will be effect after SSSE3 CPUID is detected. Of course we can just
> keep 2 new entries with SSSE3 asm optimizations, the issue is: we lost
> the optimizations which already exist in current SSE2 table. So, w/o
> merging or w/o totally duplication from SSE2 file, there will be performance unfriendly.
No, all the fast path tables are tried one after another, so there should be no problem using SSE2 and MMX fast paths as fallbacks. That's the whole point of having delegates. If you see any issue here, that's likely a bug.
Still merging fast path tables may make the search for the needed fast path function a bit faster in the case of a fast path cache miss. And there might be a bit more motivation to pursue such optimizations once SSE2 is not at a top of delegate chain anymore ;) This is especially important for the combiner functions which are called for each scanline, so walking the delegate chain is highly undesirable there. The simple solution at least for the combiner functions is to just copy all the pointers for unimplemented functions from the previous implementation at setup time. And thus make sure that 'delegate_combine_32' is never ever called. I tried to propose something like this 'pixman_blt' delegates earlier (for example, see "imp->blt = fast->blt;"
for the changes in pixman-vmx.c):
But as I mentioned before, any merging of fast path tables or improvement for the delegates chain handling is an additional optimization. And it belongs to a separate patch.
> Sure, It is ok for us for this "correctness firstly, then performance
> in next wave" philosophy. A new SSSE3 file with only 2 fast path
> entries is ok?
Yes, this looks right.
> For simplifying on CPU detection, I think same " correctness firstly,
> then performance in next wave" philosophy might be followed. A full
> CPU detection for 64 bit can be added firstly as a baseline, the one
> who can test win32 and Solaris can help us to make it shorter.
What "correctness" you are talking about? As already noticed by Makoto Kato earlier , MSVC does not support inline assembly in 64-bit mode So your patch was clearly broken and untested in this configuration.
IMHO it is always easier to add missing functionality in the future (support for SSSE3 optimizations on the systems other than linux) than removing broken untested system-dependent garbage from the source code. Especially considering that these other systems still need to do something about the GNU assembler part of your patch. But again, a comment from somebody else would be very much welcome here. I don't have a strong opinion about this part.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 41196 bytes
More information about the Pixman