[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Sat Aug 21 08:37:26 PDT 2010

On Friday 20 August 2010 19:36:07 Xu, Samuel wrote:
> We measured performance, and compared with original SSE2 intrinsic enabled
> version(0.19.4), on ATOM, and get following findings using 480P flash
> H.264 video playing workload:
> 1) sse2_composite_src_x888_8888()'s cycle reduced 67%. This function's total
> cycle ratio over whole system reduced from 5.6% to 1.9%

This is not directly related to your pixman patch, but looks like the right 
place to fix the performance problem in your flash use case is YUV->RGB 
conversion. Setting alpha channel to 0xFF there would be the most efficient and 
src_x888_8888 operation could be totally eliminated.

That said, improving pixman performance in general is still welcome and is a 
nice thing to have (for the other common use cases at least).

I see that you dropped ssse3 x8r8g8b8 fetcher optimization in your last patch 
and added ssse3 src_x888_8888 fast path instead. It's a bit sad, because other 
operations like over_8888_x888 with bilinear scaling could also benefit from 
the optimized fetcher for example. On the other hand, the lack of clarity 
regarding how to add SIMD optimized fetchers is avoided this way ;)

> 2) whole system's C0 percentage reduced from 68.0% to 62.6%
> Maybe it is not " dramatically", while we are glad to see those gain on
> both perf and power.

A peformance gain in the 4-5% ballpark looks like a major improvement to me. 

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100821/26ce1511/attachment.pgp>