[Pixman] Testing (Re: [PATCH 3/3] ARMv6: Add fast path for over_n_8888_8888_ca)

Fri Apr 4 00:28:05 PDT 2014

On Fri, 4 Apr 2014 08:24:18 +0300
Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:

> On Mon, 31 Mar 2014 15:03:45 +0300
> Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > From: Ben Avison <bavison at riscosopen.org>
> > 
> > Benchmark results, "before" is the patch
> > - ARMv6: Add fast path for over_reverse_n_8888,
> > "after" contains the additional patches:
> > - ARM: share pixman_asm_function definition
> > - ARMv6: Support for very variable-hungry composite operations
> > - ARMv6: Add fast path for over_n_8888_8888_ca (this patch)
> > 
> > lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations:
> > 
> >        Before          After
> >       Mean StdDev     Mean StdDev   Confidence   Change
> > L1     2.7    0.0     16.0    0.0    100.00%    +495.0%
> > L2     2.4    0.0     14.3    0.2    100.00%    +497.7%
> > M      2.3    0.0     14.8    0.0    100.00%    +528.6%
> > HT     2.2    0.0      9.6    0.0    100.00%    +341.4%
> > VT     2.2    0.0      9.4    0.0    100.00%    +331.7%
> > R      2.2    0.0      9.4    0.0    100.00%    +327.3%
> > RT     1.9    0.0      5.3    0.1    100.00%    +181.5%
> > 
> > At most 3 outliers rejected per case per set.
> > 
> > cairo-perf-trace with trimmed traces, 30 iterations:
> > 
> >                                     Before          After
> >                                    Mean StdDev     Mean StdDev   Confidence   Change
> > t-firefox-talos-gfx.trace          32.9    0.4     25.4    0.4    100.00%     +29.6%
> > t-firefox-scrolling.trace          31.2    0.1     24.6    0.1    100.00%     +26.7%
> > t-gnome-terminal-vim.trace         22.2    0.1     19.8    0.2    100.00%     +11.7%
> > t-firefox-planet-gnome.trace       11.5    0.0     10.9    0.0    100.00%      +6.4%
> > t-evolution.trace                  13.8    0.1     13.0    0.1    100.00%      +5.9%
> > t-gvim.trace                       33.5    0.2     33.0    0.2    100.00%      +1.3%
> > t-xfce4-terminal-a1.trace           4.8    0.0      4.8    0.0    100.00%      +1.1%
> > t-poppler-reseau.trace             22.4    0.1     22.1    0.1    100.00%      +1.0%
> > t-firefox-talos-svg.trace          20.5    0.1     20.4    0.0    100.00%      +0.7%
> > t-gnome-system-monitor.trace       17.2    0.0     17.1    0.0    100.00%      +0.6%
> > t-swfdec-giant-steps.trace         14.9    0.0     14.8    0.0    100.00%      +0.6%
> > t-midori-zoomed.trace               8.0    0.0      8.0    0.0    100.00%      +0.5%
> > t-firefox-paintball.trace          18.0    0.0     17.9    0.0    100.00%      +0.5%
> > t-firefox-canvas.trace             18.0    0.0     17.9    0.0    100.00%      +0.3%
> > t-firefox-asteroids.trace          11.1    0.0     11.1    0.0    100.00%      +0.3%
> > t-firefox-fishbowl.trace           21.2    0.0     21.1    0.0    100.00%      +0.3%
> > t-chromium-tabs.trace               4.9    0.0      4.9    0.0     95.59%      +0.3%  (insignificant)
> > t-poppler.trace                     9.7    0.0      9.7    0.1     92.48%      +0.2%  (insignificant)
> > t-firefox-canvas-swscroll.trace    32.1    0.1     32.1    0.1     76.28%      +0.1%  (insignificant)
> > t-firefox-fishtank.trace           13.2    0.0     13.2    0.0     82.91%      +0.0%  (insignificant)
> > t-swfdec-youtube.trace              7.8    0.0      7.8    0.0     16.82%      +0.0%  (insignificant)
> > t-firefox-chalkboard.trace         36.6    0.0     36.6    0.0    100.00%      -0.1%
> > t-grads-heat-map.trace              4.4    0.0      4.4    0.0     99.95%      -0.6%
> > t-firefox-particles.trace          27.3    0.2     27.5    0.1    100.00%      -0.6%
> > t-firefox-canvas-alpha.trace       20.5    0.3     20.7    0.3     97.72%      -0.8%  (insignificant)
> > 
> > At most 6 outliers rejected per case per set.
> > 
> > Cairo perf reports the running time, but the change is computed for
> > operations per second instead (inverse of running time).
> > 
> > Confidence is based on Welch's t-test. Absolute changes less than 1%
> > can be accounted as measurement errors, even if statistically
> > significant.
> > 
> > v4, Pekka Paalanen <pekka.paalanen at collabora.co.uk> :
> > 	Use pixman_asm_function instead of startfunc.
> > 	Rebased. Re-benchmarked on Raspberry Pi.
> > 	Commit message.
> 
> Appears that this code fails the 'blitters-test' if it is run a bit
> longer than the default use of it in 'make check':
> 
> 
> ./fuzzer-find-diff.pl ./blitters-test.generic ./blitters-test.armv6
> 
> [...]
> 
> op=PIXMAN_OP_OVER
> src_fmt=r5g6b5, dst_fmt=a8r8g8b8, mask_fmt=a8r8g8b8
> src_width=1, src_height=1, dst_width=235, dst_height=12
> src_x=0, src_y=0, dst_x=49, dst_y=7
> src_stride=12, dst_stride=940
> w=185, h=1
> 
> [...]
> 
> The problematic conditions can be reproduced by running:
> ./blitters-test 4928372
> 
> 
> And this is not the first time we miss a bug because of running
> blitters-test just a little bit shorter than would be necessary
> to detect it:
>     http://lists.freedesktop.org/archives/pixman/2013-March/002700.html
> And we had a similar near miss in a couple of other cases. So let's
> increase the loop counter in it from 2000000 -> 10000000 this time for
> real. The downside is that 'make check' is going to run longer
> (especially on the Raspberry Pi or MIPS32). But on my desktop PC it
> changes the time spent on running this particular test from ~3.6s
> to ~23s. 
> 
> As this patch can't go in yet, we also put "ARMv6: Support for very
> variable-hungry composite operations" on hold for a bit.
> 

Hi,

thank you for pushing the two patches, and running extended tests. I
will check with Ben on what to do here.

Could someone point me to a document describing how one uses these
testing tools properly? Hopefully it would answer all my questions
below.

In the pixman test directory on the rpi, I issued
$ ./fuzzer-find-diff.pl ./blitters-test.generic ./blitters-test.armv6 10000000
Success: 10000000 tests finished

And also without the '10000000' argument, I waited for a lot longer (a
few minutes), and it never indicated an error.

Should I somehow manually create the binaries blitters-test.generic and
blitters-test.armv6 before running that command? Right now, there are
no files with those names anywhere, so I was a bit surprised it ran
just fine.

I also couldn't figure out, how does 'make check' do this comparison,
if it is supposed to have those two binaries built separately somehow?
OTOH, I see that fuzzer_test_main() takes a checksum as an argument.
How do you determine what the correct checksum should be?

After reading the big comment on fuzzer_test_main() and the usage of
fuzzer-find-diff.pl, I'm getting the hunch, that the procedure would be
something like this:
- compile pixman without optimizations producing statically linked
  blitter-test (how?), rename it to blitters-test.generic
- compile pixman with optimizations producing statically linked
  blitter-test (how?), rename it to blitters-test.armv6 (on rpi)
- run with fuzzer-find-diff.pl for the predetermined number of rounds
- if no differences found, take the final checksum (from where?) and
  hardcode it in the fuzzer_test_main() call.
And normally that would be done only by the maintainers, or when
someone adds a new fuzzer test. Is that right?

For building the generic version, or whatever version is the gold
standard, should I use all the --disable switches mentioned
in ./configure --help?

And 'make check' only runs the whatever was built and checks just
against the hardcoded checksum?

Thanks,
pq