[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
siarhei.siamashka at gmail.com
Sat Aug 21 08:37:26 PDT 2010
On Friday 20 August 2010 19:36:07 Xu, Samuel wrote:
> We measured performance, and compared with original SSE2 intrinsic enabled
> version(0.19.4), on ATOM, and get following findings using 480P flash
> H.264 video playing workload:
> 1) sse2_composite_src_x888_8888()'s cycle reduced 67%. This function's total
> cycle ratio over whole system reduced from 5.6% to 1.9%
This is not directly related to your pixman patch, but looks like the right
place to fix the performance problem in your flash use case is YUV->RGB
conversion. Setting alpha channel to 0xFF there would be the most efficient and
src_x888_8888 operation could be totally eliminated.
That said, improving pixman performance in general is still welcome and is a
nice thing to have (for the other common use cases at least).
I see that you dropped ssse3 x8r8g8b8 fetcher optimization in your last patch
and added ssse3 src_x888_8888 fast path instead. It's a bit sad, because other
operations like over_8888_x888 with bilinear scaling could also benefit from
the optimized fetcher for example. On the other hand, the lack of clarity
regarding how to add SIMD optimized fetchers is avoided this way ;)
> 2) whole system's C0 percentage reduced from 68.0% to 62.6%
> Maybe it is not " dramatically", while we are glad to see those gain on
> both perf and power.
A peformance gain in the 4-5% ballpark looks like a major improvement to me.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: This is a digitally signed message part.
More information about the Pixman