[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Dec 2 08:00:22 PST 2010


On Monday 29 November 2010 20:59:52 Siarhei Siamashka wrote:
> On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote:
> > For MOVD, we simplified the backward copy code since pervious code is too
> > long and not gain obvious performance,
> 
> And this is what I'm worried about. First you proposed this part of code
> (backwards copy) without providing any initial explanation or benchmark
> numbers. And now you abandon it, also without providing any benchmark
> numbers or description of your test. But because you still have x86
> instructions there instead of MOVD, store forwarding aliasing surely may
> affect performance, as demonstrated by my simple test program posted
> earlier [3] as part of the discussion.
> 
> Can we be sure that this SSSE3 patch actually provides any practical
> performance improvement over the current pixman code? Considering that
> compared to your initial profiling data, now SSE2 optimized code is used
> for this operation and also problematic software prefetch which was almost
> halving [4] effective memory bandwidth for this operation is now
> eliminated [5].

> Surely, using 'test/lowlevel-blt-bench' microbenchmark it showed good
> improvement for L1 cached data. But on the other hand, performance of small
> random operations has dropped (shown as 'R' statistics). Maybe that's
> happening because of the scanline function call overhead and this fwd/bwd
> special handling for small buffer sizes? So a final verification with some
> real practical use case would be interesting.
>
> [...]
>
> 1. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00357.html
> 2. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00378.html
> 3. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00415.html
> 4. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00404.html
> 5. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00563.html

Just did some benchmarks with the 'gnome-system-monitor' cairo perf trace
which happens to use 'src_x888_8888' operation. It looks like SSSE3 has
about the same performance as SSE2, providing no measurable benefits on
this use case. The low complexity of this operation and the hardware
prefetcher make any differences between implementations much less
noticeable in general.

Intel Atom N450, 64-bit userland, gcc 4.5.1

$ CAIRO_TEST_TARGET=image ./cairo-perf-trace gnome-system-monitor.trace 

======= C slow path =========

[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.21.3
[  0]    image         gnome-system-monitor   15.011   15.034   0.08%    6/6

======= C fast path [1] =========

[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.21.3
[  0]    image         gnome-system-monitor   14.659   14.697   0.20%    6/6

======= SSE2 fast path [2] =====

[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.21.3
[  0]    image         gnome-system-monitor   14.431   14.496   0.19%    6/6

======= SSSE3 fast path [3] ======

[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.21.3
[  0]    image         gnome-system-monitor   14.455   14.496   0.17%    6/6

====== artificial test with just an empty stub for src_x888_8888 ========

[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.21.3
[  0]    image         gnome-system-monitor   12.215   12.241   0.11%    6/6

---

So I'm not sure if this SSE3 code is very useful as is. But of course, if some
practical use case makes a heavy use of this function so that it works on the
data residing in L1/L2 caches, then it could show a big improvement.


1. http://cgit.freedesktop.org/pixman/commit/?id=16bae834
2. http://cgit.freedesktop.org/pixman/commit/?id=e0b430a1
3. http://lists.freedesktop.org/archives/pixman/2010-November/000742.html

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20101202/6e0c81bd/attachment.pgp>


More information about the Pixman mailing list