[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Liu Xinyun
xinyun.liu at intel.com
Thu Dec 2 18:49:18 PST 2010
I have a similar performance result got about one month ago.
Gvim test has great performance increase, about 360%. Other tests has no side
effect and no increase neither.
I tested it on 32-bit userland. I will test it again based on the git code and
give out the data later.
Regards,
Xinyun
On Fri, Dec 03, 2010 at 12:00:22AM +0800, Siarhei Siamashka wrote:
> Just did some benchmarks with the 'gnome-system-monitor' cairo perf trace
> which happens to use 'src_x888_8888' operation. It looks like SSSE3 has
> about the same performance as SSE2, providing no measurable benefits on
> this use case. The low complexity of this operation and the hardware
> prefetcher make any differences between implementations much less
> noticeable in general.
>
> Intel Atom N450, 64-bit userland, gcc 4.5.1
>
> $ CAIRO_TEST_TARGET=image ./cairo-perf-trace gnome-system-monitor.trace
>
> ======= C slow path =========
>
> [ # ] backend test min(s) median(s) stddev. count
> [ # ] image: pixman 0.21.3
> [ 0] image gnome-system-monitor 15.011 15.034 0.08% 6/6
>
> ======= C fast path [1] =========
>
> [ # ] backend test min(s) median(s) stddev. count
> [ # ] image: pixman 0.21.3
> [ 0] image gnome-system-monitor 14.659 14.697 0.20% 6/6
>
> ======= SSE2 fast path [2] =====
>
> [ # ] backend test min(s) median(s) stddev. count
> [ # ] image: pixman 0.21.3
> [ 0] image gnome-system-monitor 14.431 14.496 0.19% 6/6
>
> ======= SSSE3 fast path [3] ======
>
> [ # ] backend test min(s) median(s) stddev. count
> [ # ] image: pixman 0.21.3
> [ 0] image gnome-system-monitor 14.455 14.496 0.17% 6/6
>
> ====== artificial test with just an empty stub for src_x888_8888 ========
>
> [ # ] backend test min(s) median(s) stddev. count
> [ # ] image: pixman 0.21.3
> [ 0] image gnome-system-monitor 12.215 12.241 0.11% 6/6
>
> ---
>
> So I'm not sure if this SSE3 code is very useful as is. But of course, if some
> practical use case makes a heavy use of this function so that it works on the
> data residing in L1/L2 caches, then it could show a big improvement.
>
>
> 1. http://cgit.freedesktop.org/pixman/commit/?id=16bae834
> 2. http://cgit.freedesktop.org/pixman/commit/?id=e0b430a1
> 3. http://lists.freedesktop.org/archives/pixman/2010-November/000742.html
>
> --
> Best regards,
> Siarhei Siamashka
More information about the Pixman
mailing list