[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Siarhei Siamashka siarhei.siamashka at gmail.com
Sun Aug 29 03:27:08 PDT 2010

On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Q:---" What problems do you have without "merge" mechanism?"
> A: Of course there isn't correctness issue w/o "merge".
> Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each
> other, although some checking forms delegate chain. While after delegate
> chain formed, only one fast table effect. Is my understanding correct? So
> after ssse3_fast_paths is newly added, only SSSE3 table will be effect
> after SSSE3 CPUID is detected. Of course we can just keep 2 new entries
> with SSSE3 asm optimizations, the issue is: we lost the optimizations
> which already exist in current SSE2 table. So, w/o merging or w/o totally
> duplication from SSE2 file, there will be performance unfriendly.

No, all the fast path tables are tried one after another, so there should be no 
problem using SSE2 and MMX fast paths as fallbacks. That's the whole point of
having delegates. If you see any issue here, that's likely a bug.

Still merging fast path tables may make the search for the needed fast path 
function a bit faster in the case of a fast path cache miss. And there might be 
a bit more motivation to pursue such optimizations once SSE2 is not at a top of 
delegate chain anymore ;) This is especially important for the combiner 
functions which are called for each scanline, so walking the delegate chain is 
highly undesirable there. The simple solution at least for the combiner 
functions is to just copy all the pointers for unimplemented functions from the 
previous implementation at setup time. And thus make sure that 
'delegate_combine_32' is never ever called. I tried to propose something like 
this 'pixman_blt' delegates earlier (for example, see "imp->blt = fast->blt;" 
for the changes in pixman-vmx.c):

But as I mentioned before, any merging of fast path tables or improvement for
the delegates chain handling is an additional optimization. And it belongs to a
separate patch.

> Sure, It is ok for us for this "correctness firstly, then performance in
> next wave" philosophy. A new SSSE3 file with only 2 fast path entries is
> ok?

Yes, this looks right.

> For simplifying on CPU detection, I think same " correctness firstly, then
> performance in next wave" philosophy might be followed. A full CPU
> detection for 64 bit can be added firstly as a baseline, the one who can
> test win32 and Solaris can help us to make it shorter.

What "correctness" you are talking about? As already noticed by  
Makoto Kato earlier [1], MSVC does not support inline assembly in 64-bit
mode  So your patch was clearly broken and untested in this configuration.

IMHO it is always easier to add missing functionality in the future (support
for SSSE3 optimizations on the systems other than linux) than removing broken
untested system-dependent garbage from the source code. Especially considering
that these other systems still need to do something about the GNU assembler
part of your patch. But again, a comment from somebody else would be very much
welcome here. I don't have a strong opinion about this part.

1. http://lists.freedesktop.org/archives/pixman/2010-August/000424.html

Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100829/dcd3fd83/attachment-0001.pgp>

More information about the Pixman mailing list