[Pixman] [PATCH 3/3] ARMv6: Add fast path for over_n_8888_8888_ca
Siarhei Siamashka
siarhei.siamashka at gmail.com
Thu Apr 3 22:24:18 PDT 2014
On Mon, 31 Mar 2014 15:03:45 +0300
Pekka Paalanen <ppaalanen at gmail.com> wrote:
> From: Ben Avison <bavison at riscosopen.org>
>
> Benchmark results, "before" is the patch
> - ARMv6: Add fast path for over_reverse_n_8888,
> "after" contains the additional patches:
> - ARM: share pixman_asm_function definition
> - ARMv6: Support for very variable-hungry composite operations
> - ARMv6: Add fast path for over_n_8888_8888_ca (this patch)
>
> lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations:
>
> Before After
> Mean StdDev Mean StdDev Confidence Change
> L1 2.7 0.0 16.0 0.0 100.00% +495.0%
> L2 2.4 0.0 14.3 0.2 100.00% +497.7%
> M 2.3 0.0 14.8 0.0 100.00% +528.6%
> HT 2.2 0.0 9.6 0.0 100.00% +341.4%
> VT 2.2 0.0 9.4 0.0 100.00% +331.7%
> R 2.2 0.0 9.4 0.0 100.00% +327.3%
> RT 1.9 0.0 5.3 0.1 100.00% +181.5%
>
> At most 3 outliers rejected per case per set.
>
> cairo-perf-trace with trimmed traces, 30 iterations:
>
> Before After
> Mean StdDev Mean StdDev Confidence Change
> t-firefox-talos-gfx.trace 32.9 0.4 25.4 0.4 100.00% +29.6%
> t-firefox-scrolling.trace 31.2 0.1 24.6 0.1 100.00% +26.7%
> t-gnome-terminal-vim.trace 22.2 0.1 19.8 0.2 100.00% +11.7%
> t-firefox-planet-gnome.trace 11.5 0.0 10.9 0.0 100.00% +6.4%
> t-evolution.trace 13.8 0.1 13.0 0.1 100.00% +5.9%
> t-gvim.trace 33.5 0.2 33.0 0.2 100.00% +1.3%
> t-xfce4-terminal-a1.trace 4.8 0.0 4.8 0.0 100.00% +1.1%
> t-poppler-reseau.trace 22.4 0.1 22.1 0.1 100.00% +1.0%
> t-firefox-talos-svg.trace 20.5 0.1 20.4 0.0 100.00% +0.7%
> t-gnome-system-monitor.trace 17.2 0.0 17.1 0.0 100.00% +0.6%
> t-swfdec-giant-steps.trace 14.9 0.0 14.8 0.0 100.00% +0.6%
> t-midori-zoomed.trace 8.0 0.0 8.0 0.0 100.00% +0.5%
> t-firefox-paintball.trace 18.0 0.0 17.9 0.0 100.00% +0.5%
> t-firefox-canvas.trace 18.0 0.0 17.9 0.0 100.00% +0.3%
> t-firefox-asteroids.trace 11.1 0.0 11.1 0.0 100.00% +0.3%
> t-firefox-fishbowl.trace 21.2 0.0 21.1 0.0 100.00% +0.3%
> t-chromium-tabs.trace 4.9 0.0 4.9 0.0 95.59% +0.3% (insignificant)
> t-poppler.trace 9.7 0.0 9.7 0.1 92.48% +0.2% (insignificant)
> t-firefox-canvas-swscroll.trace 32.1 0.1 32.1 0.1 76.28% +0.1% (insignificant)
> t-firefox-fishtank.trace 13.2 0.0 13.2 0.0 82.91% +0.0% (insignificant)
> t-swfdec-youtube.trace 7.8 0.0 7.8 0.0 16.82% +0.0% (insignificant)
> t-firefox-chalkboard.trace 36.6 0.0 36.6 0.0 100.00% -0.1%
> t-grads-heat-map.trace 4.4 0.0 4.4 0.0 99.95% -0.6%
> t-firefox-particles.trace 27.3 0.2 27.5 0.1 100.00% -0.6%
> t-firefox-canvas-alpha.trace 20.5 0.3 20.7 0.3 97.72% -0.8% (insignificant)
>
> At most 6 outliers rejected per case per set.
>
> Cairo perf reports the running time, but the change is computed for
> operations per second instead (inverse of running time).
>
> Confidence is based on Welch's t-test. Absolute changes less than 1%
> can be accounted as measurement errors, even if statistically
> significant.
>
> v4, Pekka Paalanen <pekka.paalanen at collabora.co.uk> :
> Use pixman_asm_function instead of startfunc.
> Rebased. Re-benchmarked on Raspberry Pi.
> Commit message.
Appears that this code fails the 'blitters-test' if it is run a bit
longer than the default use of it in 'make check':
./fuzzer-find-diff.pl ./blitters-test.generic ./blitters-test.armv6
[...]
op=PIXMAN_OP_OVER
src_fmt=r5g6b5, dst_fmt=a8r8g8b8, mask_fmt=a8r8g8b8
src_width=1, src_height=1, dst_width=235, dst_height=12
src_x=0, src_y=0, dst_x=49, dst_y=7
src_stride=12, dst_stride=940
w=185, h=1
[...]
The problematic conditions can be reproduced by running:
./blitters-test 4928372
And this is not the first time we miss a bug because of running
blitters-test just a little bit shorter than would be necessary
to detect it:
http://lists.freedesktop.org/archives/pixman/2013-March/002700.html
And we had a similar near miss in a couple of other cases. So let's
increase the loop counter in it from 2000000 -> 10000000 this time for
real. The downside is that 'make check' is going to run longer
(especially on the Raspberry Pi or MIPS32). But on my desktop PC it
changes the time spent on running this particular test from ~3.6s
to ~23s.
As this patch can't go in yet, we also put "ARMv6: Support for very
variable-hungry composite operations" on hold for a bit.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list