[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Siarhei Siamashka
siarhei.siamashka at gmail.com
Thu Dec 2 08:00:22 PST 2010
On Monday 29 November 2010 20:59:52 Siarhei Siamashka wrote:
> On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote:
> > For MOVD, we simplified the backward copy code since pervious code is too
> > long and not gain obvious performance,
>
> And this is what I'm worried about. First you proposed this part of code
> (backwards copy) without providing any initial explanation or benchmark
> numbers. And now you abandon it, also without providing any benchmark
> numbers or description of your test. But because you still have x86
> instructions there instead of MOVD, store forwarding aliasing surely may
> affect performance, as demonstrated by my simple test program posted
> earlier [3] as part of the discussion.
>
> Can we be sure that this SSSE3 patch actually provides any practical
> performance improvement over the current pixman code? Considering that
> compared to your initial profiling data, now SSE2 optimized code is used
> for this operation and also problematic software prefetch which was almost
> halving [4] effective memory bandwidth for this operation is now
> eliminated [5].
> Surely, using 'test/lowlevel-blt-bench' microbenchmark it showed good
> improvement for L1 cached data. But on the other hand, performance of small
> random operations has dropped (shown as 'R' statistics). Maybe that's
> happening because of the scanline function call overhead and this fwd/bwd
> special handling for small buffer sizes? So a final verification with some
> real practical use case would be interesting.
>
> [...]
>
> 1. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00357.html
> 2. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00378.html
> 3. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00415.html
> 4. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00404.html
> 5. http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00563.html
Just did some benchmarks with the 'gnome-system-monitor' cairo perf trace
which happens to use 'src_x888_8888' operation. It looks like SSSE3 has
about the same performance as SSE2, providing no measurable benefits on
this use case. The low complexity of this operation and the hardware
prefetcher make any differences between implementations much less
noticeable in general.
Intel Atom N450, 64-bit userland, gcc 4.5.1
$ CAIRO_TEST_TARGET=image ./cairo-perf-trace gnome-system-monitor.trace
======= C slow path =========
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.21.3
[ 0] image gnome-system-monitor 15.011 15.034 0.08% 6/6
======= C fast path [1] =========
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.21.3
[ 0] image gnome-system-monitor 14.659 14.697 0.20% 6/6
======= SSE2 fast path [2] =====
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.21.3
[ 0] image gnome-system-monitor 14.431 14.496 0.19% 6/6
======= SSSE3 fast path [3] ======
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.21.3
[ 0] image gnome-system-monitor 14.455 14.496 0.17% 6/6
====== artificial test with just an empty stub for src_x888_8888 ========
[ # ] backend test min(s) median(s) stddev. count
[ # ] image: pixman 0.21.3
[ 0] image gnome-system-monitor 12.215 12.241 0.11% 6/6
---
So I'm not sure if this SSE3 code is very useful as is. But of course, if some
practical use case makes a heavy use of this function so that it works on the
data residing in L1/L2 caches, then it could show a big improvement.
1. http://cgit.freedesktop.org/pixman/commit/?id=16bae834
2. http://cgit.freedesktop.org/pixman/commit/?id=e0b430a1
3. http://lists.freedesktop.org/archives/pixman/2010-November/000742.html
--
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20101202/6e0c81bd/attachment.pgp>
More information about the Pixman
mailing list