[Pixman] [PATCH 3/3] ARMv6: Add fast path for over_n_8888_8888_ca

Thu Apr 3 22:24:18 PDT 2014

On Mon, 31 Mar 2014 15:03:45 +0300
Pekka Paalanen <ppaalanen at gmail.com> wrote:

> From: Ben Avison <bavison at riscosopen.org>
> 
> Benchmark results, "before" is the patch
> - ARMv6: Add fast path for over_reverse_n_8888,
> "after" contains the additional patches:
> - ARM: share pixman_asm_function definition
> - ARMv6: Support for very variable-hungry composite operations
> - ARMv6: Add fast path for over_n_8888_8888_ca (this patch)
> 
> lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations:
> 
>        Before          After
>       Mean StdDev     Mean StdDev   Confidence   Change
> L1     2.7    0.0     16.0    0.0    100.00%    +495.0%
> L2     2.4    0.0     14.3    0.2    100.00%    +497.7%
> M      2.3    0.0     14.8    0.0    100.00%    +528.6%
> HT     2.2    0.0      9.6    0.0    100.00%    +341.4%
> VT     2.2    0.0      9.4    0.0    100.00%    +331.7%
> R      2.2    0.0      9.4    0.0    100.00%    +327.3%
> RT     1.9    0.0      5.3    0.1    100.00%    +181.5%
> 
> At most 3 outliers rejected per case per set.
> 
> cairo-perf-trace with trimmed traces, 30 iterations:
> 
>                                     Before          After
>                                    Mean StdDev     Mean StdDev   Confidence   Change
> t-firefox-talos-gfx.trace          32.9    0.4     25.4    0.4    100.00%     +29.6%
> t-firefox-scrolling.trace          31.2    0.1     24.6    0.1    100.00%     +26.7%
> t-gnome-terminal-vim.trace         22.2    0.1     19.8    0.2    100.00%     +11.7%
> t-firefox-planet-gnome.trace       11.5    0.0     10.9    0.0    100.00%      +6.4%
> t-evolution.trace                  13.8    0.1     13.0    0.1    100.00%      +5.9%
> t-gvim.trace                       33.5    0.2     33.0    0.2    100.00%      +1.3%
> t-xfce4-terminal-a1.trace           4.8    0.0      4.8    0.0    100.00%      +1.1%
> t-poppler-reseau.trace             22.4    0.1     22.1    0.1    100.00%      +1.0%
> t-firefox-talos-svg.trace          20.5    0.1     20.4    0.0    100.00%      +0.7%
> t-gnome-system-monitor.trace       17.2    0.0     17.1    0.0    100.00%      +0.6%
> t-swfdec-giant-steps.trace         14.9    0.0     14.8    0.0    100.00%      +0.6%
> t-midori-zoomed.trace               8.0    0.0      8.0    0.0    100.00%      +0.5%
> t-firefox-paintball.trace          18.0    0.0     17.9    0.0    100.00%      +0.5%
> t-firefox-canvas.trace             18.0    0.0     17.9    0.0    100.00%      +0.3%
> t-firefox-asteroids.trace          11.1    0.0     11.1    0.0    100.00%      +0.3%
> t-firefox-fishbowl.trace           21.2    0.0     21.1    0.0    100.00%      +0.3%
> t-chromium-tabs.trace               4.9    0.0      4.9    0.0     95.59%      +0.3%  (insignificant)
> t-poppler.trace                     9.7    0.0      9.7    0.1     92.48%      +0.2%  (insignificant)
> t-firefox-canvas-swscroll.trace    32.1    0.1     32.1    0.1     76.28%      +0.1%  (insignificant)
> t-firefox-fishtank.trace           13.2    0.0     13.2    0.0     82.91%      +0.0%  (insignificant)
> t-swfdec-youtube.trace              7.8    0.0      7.8    0.0     16.82%      +0.0%  (insignificant)
> t-firefox-chalkboard.trace         36.6    0.0     36.6    0.0    100.00%      -0.1%
> t-grads-heat-map.trace              4.4    0.0      4.4    0.0     99.95%      -0.6%
> t-firefox-particles.trace          27.3    0.2     27.5    0.1    100.00%      -0.6%
> t-firefox-canvas-alpha.trace       20.5    0.3     20.7    0.3     97.72%      -0.8%  (insignificant)
> 
> At most 6 outliers rejected per case per set.
> 
> Cairo perf reports the running time, but the change is computed for
> operations per second instead (inverse of running time).
> 
> Confidence is based on Welch's t-test. Absolute changes less than 1%
> can be accounted as measurement errors, even if statistically
> significant.
> 
> v4, Pekka Paalanen <pekka.paalanen at collabora.co.uk> :
> 	Use pixman_asm_function instead of startfunc.
> 	Rebased. Re-benchmarked on Raspberry Pi.
> 	Commit message.

Appears that this code fails the 'blitters-test' if it is run a bit
longer than the default use of it in 'make check':

./fuzzer-find-diff.pl ./blitters-test.generic ./blitters-test.armv6

[...]

op=PIXMAN_OP_OVER
src_fmt=r5g6b5, dst_fmt=a8r8g8b8, mask_fmt=a8r8g8b8
src_width=1, src_height=1, dst_width=235, dst_height=12
src_x=0, src_y=0, dst_x=49, dst_y=7
src_stride=12, dst_stride=940
w=185, h=1

[...]

The problematic conditions can be reproduced by running:
./blitters-test 4928372

And this is not the first time we miss a bug because of running
blitters-test just a little bit shorter than would be necessary
to detect it:
    http://lists.freedesktop.org/archives/pixman/2013-March/002700.html
And we had a similar near miss in a couple of other cases. So let's
increase the loop counter in it from 2000000 -> 10000000 this time for
real. The downside is that 'make check' is going to run longer
(especially on the Raspberry Pi or MIPS32). But on my desktop PC it
changes the time spent on running this particular test from ~3.6s
to ~23s. 

As this patch can't go in yet, we also put "ARMv6: Support for very
variable-hungry composite operations" on hold for a bit.

-- 
Best regards,
Siarhei Siamashka