[Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking

Thu Aug 20 06:58:44 PDT 2015

From: Pekka Paalanen <pekka.paalanen at collabora.co.uk>

Hi,

I and Ben have been fighting with benchmarking on the Raspberry Pi 1 for a long
time. The problem is unexpected performance differences. Insignificant changes
to the code can cause significant changes to lowlevel-blt-bench performance
results. The change can be a speed up or a slow down, and it can happen on
portions of the code we never modified.

Over the course, we have developed our own benchmarking scripts, that execute
lowlevel-blt-bench many times and then do a statistical analysis (Student's
t-test) on the significance of the change. When I say "significant changes", I
mean in the t-test sense.

I have stabilized the running environment as far as I could: RPi 1 is using an
usb-disk, swap is not on SD-card, the script 'sync's the disks between each
execution, init level is changed to 1 to eliminate as much of background
processes as possible, the network is unplugged, and the monitor is forced to
stay on (scanout consumes memory bandwidth). None of this really helped with
the unexpected differences - they are not random, but very repeatable *if* one
takes great care to just iterate on the same program binaries.

We also ruled out drifting CPU clock speeds. The scripts run each code revision
interleaved; each iteration consist of running every revision once. I have done
30 - 100 iterations, the total runtime being usually somewhere between 4 and 20
hours, depending. Any drift in clock speed or such would be removed in the
statistical analysis. We also asked the hardware people, and such drift should
just not happen. CPU scaling should not be happening either on RPi1.

The magnitude of the unexpected changes is sometimes fairly big, in the order
of 5%, but we have seen anomalies even up to 20%.

A thing that explains a great deal of these anomalies, but not all of it, has
something to do with function addresses. There are hypotheses that it might
have to do with the branch predictor and its cache. We made a test targeting
exactly that idea: pick a fast path function that seems to be most susceptible
to unexpected changes, pad it with x nops before the function start and N-x
nops after the function end. We never execute those nops, but changing x
changes the function start address while keeping everything else in the whole
binary in the same place.

The results were mind-boggling: depending on the function starting address, the
src_8888_8888 L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470 Mpx/s.
There does not seem to be any predictable pattern on which addresses are "fast"
and which are "slow". Obviously this will screw up our benchmarks, because a
change in an unrelated function may cause another function's address to shift,
and therefore change its performance. See [1] for the plot.

How should we benchmark on Raspberry Pi 1 then? When an arbitrary application
loads libpixman-1, we have no guarantees about the loading addresses, even if
we could make the Pixman build "stable" in that respect.

We ended up with two major points:

- A patch adding a new fast path needs to be split in two: the first patch adds
  all the code, but the code is disabled. The second patch enables the new code
  without changing any function addresses. This should eliminate most of the
  unexpected performance differences.

- To get an average of what performance applications might be seeing, we must
  make sure the libpixman-1 loading address is randomized, and execute the
  benchmark program many times. This can be achieved by letting Pixman be a
  shared library that the benchmark, e.g. lowlevel-blt-bench, links
  dynamically, and ensuring ASLR does the randomization.

Of course, it's always good to minimize any possible interference, so I will
stick to my "stabilized" running environment.

With these insights, here are the patches for two new fast paths: over_n_8888
as both a generic C version, and an armv6 assembly version.

The patches that enable fast paths contain the essential benchmark results. At
the end of this email, you can find the complete performance comparisons not
only for the patches that enable fast paths but also for the patches that are
not supposed to affect performance. In the future I do not intend to include
these reports.

The benchmarks were run with 30 iterations (executions of lowlevel-blt-bench
per code revision).

References:

The statistical analysis script
https://github.com/bavison/perfcmp

My test scripts and data
https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/

[1] The plot of alignment vs. performance
https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src-8888-8888-L1.pdf

The Pixman source used to run the alignment vs. performance test:
https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=interbench-20150702

Thanks,
pq

Ben Avison (2):
  pixman-fast-path: Add over_n_8888 fast path (disabled)
  armv6: Add over_n_8888 fast path (disabled)

Pekka Paalanen (2):
  pixman-fast-path: enable over_n_8888
  armv6: enable over_n_8888

 pixman/pixman-arm-simd-asm.S | 41 +++++++++++++++++++++++++++++++++++++++++
 pixman/pixman-arm-simd.c     |  6 ++++++
 pixman/pixman-fast-path.c    | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

--
2.4.6

Benchmark results

******************

Before: upstream master, no patches applied
After: patch 1
Expectation: no performance changes

Test of feature 'in_8888_8'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    13.2   0.03     13.1   0.03    100.00%      -0.6%
L2     9.5   0.16      9.5   0.19     35.51%      -0.2%  (insignificant)
M      9.7   0.00      9.7   0.01     17.39%      +0.0%  (insignificant)
HT     7.8   0.02      7.8   0.02     92.52%      +0.1%  (insignificant)
VT     7.7   0.02      7.7   0.02     99.74%      +0.2%
R      7.3   0.01      7.3   0.01    100.00%      +0.2%
RT     4.0   0.04      4.1   0.05    100.00%      +1.5%

At most 4 outliers rejected per test per set.

Test of feature 'in_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    19.1   0.06     19.2   0.07    100.00%      +0.4%
L2    15.3   0.44     15.5   0.50     83.84%      +1.1%  (insignificant)
M     12.8   0.00     12.8   0.00     71.74%      +0.0%  (insignificant)
HT    11.3   0.03     11.3   0.03     99.97%      +0.2%
VT    11.0   0.02     11.0   0.03     99.81%      +0.2%
R     10.7   0.02     10.7   0.02     99.85%      +0.2%
RT     6.5   0.08      6.6   0.09     92.59%      +0.6%  (insignificant)

At most 0 outliers rejected per test per set.

Test of feature 'in_reverse_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    32.2   0.06     32.3   0.09    100.00%      +0.6%
L2    17.8   0.61     17.9   0.59     63.65%      +0.8%  (insignificant)
M     16.9   0.02     16.9   0.02     50.08%      -0.0%  (insignificant)
HT    12.1   0.03     12.0   0.03     83.85%      -0.1%  (insignificant)
VT    11.8   0.04     11.8   0.03     71.18%      +0.1%  (insignificant)
R     11.3   0.03     11.3   0.02     97.65%      -0.2%  (insignificant)
RT     6.1   0.11      6.1   0.07     85.29%      -0.6%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    37.8   0.12     38.0   0.07    100.00%      +0.7%
L2    27.5   0.74     27.7   0.59     65.53%      +0.6%  (insignificant)
M     26.9   0.03     26.9   0.03     31.38%      -0.0%  (insignificant)
HT    14.9   0.04     14.9   0.05     40.27%      -0.0%  (insignificant)
VT    14.1   0.04     14.0   0.04     99.37%      -0.2%
R     15.1   0.05     15.1   0.06      7.05%      -0.0%  (insignificant)
RT     7.3   0.13      7.3   0.14      7.64%      +0.0%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_8888_8_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.3   0.01      5.3   0.01    100.00%      +0.3%
L2     4.6   0.04      4.6   0.04     36.58%      -0.1%  (insignificant)
M      4.5   0.00      4.5   0.00     43.81%      +0.0%  (insignificant)
HT     3.9   0.01      3.9   0.01     67.48%      +0.0%  (insignificant)
VT     3.9   0.01      3.9   0.00     91.95%      +0.1%  (insignificant)
R      3.8   0.01      3.8   0.01     94.80%      -0.1%  (insignificant)
RT     2.3   0.01      2.3   0.02     55.99%      -0.1%  (insignificant)

At most 5 outliers rejected per test per set.

Test of feature 'over_8888_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.7   0.01      5.7   0.01     66.51%      +0.0%  (insignificant)
L2     5.0   0.03      5.0   0.05     35.72%      +0.1%  (insignificant)
M      4.8   0.00      4.8   0.00     99.95%      +0.0%
HT     4.5   0.01      4.5   0.01    100.00%      +0.2%
VT     4.5   0.01      4.5   0.01    100.00%      +0.2%
R      4.3   0.01      4.4   0.01    100.00%      +0.2%
RT     2.9   0.02      2.9   0.03     55.17%      +0.2%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     8.3   0.02      8.4   0.03     82.45%      +0.1%  (insignificant)
L2     8.1   0.02      8.1   0.01     72.48%      -0.1%  (insignificant)
M      7.4   0.00      7.4   0.00     96.90%      +0.0%  (insignificant)
HT     7.0   0.01      7.0   0.01     99.71%      +0.2%
VT     6.9   0.01      7.0   0.01     99.94%      +0.2%
R      6.8   0.01      6.8   0.01     43.44%      +0.0%  (insignificant)
RT     4.6   0.05      4.5   0.05     90.51%      -0.5%  (insignificant)

At most 5 outliers rejected per test per set.

Test of feature 'over_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    12.4   0.04     12.5   0.03    100.00%      +0.6%
L2    11.1   0.02     11.1   0.02     47.25%      -0.0%  (insignificant)
M      9.4   0.00      9.4   0.00     95.77%      +0.0%  (insignificant)
HT     8.5   0.02      8.5   0.02     74.90%      +0.1%  (insignificant)
VT     8.4   0.02      8.4   0.02     64.04%      +0.0%  (insignificant)
R      8.2   0.02      8.2   0.01     98.26%      +0.1%  (insignificant)
RT     5.5   0.08      5.5   0.05     11.15%      +0.0%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'src_0565_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   480.8   4.67    514.0   6.57    100.00%      +6.9%
L2    84.1   4.83     82.1   3.41     93.68%      -2.4%  (insignificant)
M    128.2   0.15    128.2   0.14     38.20%      +0.0%  (insignificant)
HT    49.7   0.46     49.9   0.37     92.51%      +0.4%  (insignificant)
VT    44.1   0.34     44.2   0.29     87.47%      +0.3%  (insignificant)
R     39.1   0.32     39.2   0.33     75.09%      +0.2%  (insignificant)
RT    12.4   0.34     12.5   0.31     75.39%      +0.8%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'src_1555_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    25.0   0.10     24.1   0.13    100.00%      -3.5%
L2    19.0   0.26     18.9   0.39     87.17%      -0.7%  (insignificant)
M     20.0   0.02     20.0   0.02     18.97%      +0.0%  (insignificant)
HT    12.5   0.04     12.4   0.06    100.00%      -0.9%
VT    12.4   0.05     12.3   0.06    100.00%      -0.8%
R     11.8   0.05     11.7   0.05    100.00%      -0.9%
RT     5.4   0.11      5.4   0.09     99.82%      -1.5%

At most 1 outliers rejected per test per set.

Test of feature 'src_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   468.1   4.43    478.2   9.26    100.00%      +2.2%
L2    58.0   4.34     58.5   4.14     31.74%      +0.8%  (insignificant)
M     87.4   0.13     87.4   0.13     24.42%      -0.0%  (insignificant)
HT    36.5   0.15     36.7   0.22     99.80%      +0.4%
VT    32.7   0.17     32.8   0.19     76.34%      +0.2%  (insignificant)
R     31.0   0.14     31.2   0.19     99.97%      +0.6%
RT    11.5   0.18     11.7   0.23    100.00%      +2.2%

At most 1 outliers rejected per test per set.

******************

Before: patch 1
After: patch 1+2
Expectation: performance improvement on over_n_8888

Test of feature 'in_8888_8'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    13.1   0.03     13.1   0.03     99.09%      -0.2%
L2     9.5   0.19      9.5   0.14      2.46%      +0.0%  (insignificant)
M      9.7   0.01      9.6   0.01     98.57%      -0.1%  (insignificant)
HT     7.8   0.02      7.8   0.02     17.57%      +0.0%  (insignificant)
VT     7.7   0.02      7.7   0.02      1.25%      +0.0%  (insignificant)
R      7.3   0.01      7.3   0.01     99.41%      -0.1%
RT     4.1   0.05      4.0   0.04    100.00%      -1.3%

At most 2 outliers rejected per test per set.

Test of feature 'in_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    19.2   0.07     19.2   0.08     67.38%      +0.1%  (insignificant)
L2    15.5   0.50     15.4   0.52     47.56%      -0.5%  (insignificant)
M     12.8   0.00     12.8   0.00     43.93%      -0.0%  (insignificant)
HT    11.3   0.03     11.3   0.03     10.26%      +0.0%  (insignificant)
VT    11.0   0.03     11.0   0.03     53.19%      +0.0%  (insignificant)
R     10.7   0.02     10.7   0.03     13.89%      -0.0%  (insignificant)
RT     6.6   0.09      6.6   0.09     16.12%      -0.1%  (insignificant)

At most 0 outliers rejected per test per set.

Test of feature 'in_reverse_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    32.3   0.09     32.3   0.07      1.81%      -0.0%  (insignificant)
L2    17.9   0.59     17.9   0.62      6.42%      +0.1%  (insignificant)
M     16.9   0.02     16.9   0.02     48.34%      +0.0%  (insignificant)
HT    12.0   0.03     12.1   0.04     97.10%      +0.2%  (insignificant)
VT    11.8   0.03     11.8   0.04     97.84%      +0.2%  (insignificant)
R     11.3   0.02     11.3   0.03     91.88%      +0.1%  (insignificant)
RT     6.1   0.07      6.1   0.10     96.81%      +0.8%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    38.0   0.07     38.0   0.13     34.92%      +0.0%  (insignificant)
L2    27.7   0.59     27.7   0.61      9.76%      +0.1%  (insignificant)
M     26.9   0.03     26.9   0.03     15.34%      +0.0%  (insignificant)
HT    14.9   0.05     14.9   0.05     79.64%      +0.1%  (insignificant)
VT    14.0   0.04     14.1   0.04     96.23%      +0.2%  (insignificant)
R     15.1   0.06     15.1   0.06     31.84%      +0.0%  (insignificant)
RT     7.3   0.14      7.3   0.14     12.11%      -0.1%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_8_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.3   0.01      5.3   0.01     60.93%      -0.0%  (insignificant)
L2     4.6   0.04      4.6   0.04     60.95%      +0.2%  (insignificant)
M      4.5   0.00      4.5   0.00     68.24%      -0.0%  (insignificant)
HT     3.9   0.01      3.9   0.01     62.31%      +0.0%  (insignificant)
VT     3.9   0.00      3.9   0.01     61.17%      +0.0%  (insignificant)
R      3.8   0.01      3.8   0.00     87.38%      +0.1%  (insignificant)
RT     2.3   0.02      2.3   0.01     35.89%      -0.1%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_8888_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.7   0.01      5.7   0.01     35.37%      +0.0%  (insignificant)
L2     5.0   0.05      5.0   0.04     55.24%      +0.2%  (insignificant)
M      4.8   0.00      4.8   0.00     94.55%      -0.0%  (insignificant)
HT     4.5   0.01      4.5   0.01     50.62%      +0.0%  (insignificant)
VT     4.5   0.01      4.5   0.01     80.60%      +0.1%  (insignificant)
R      4.4   0.01      4.4   0.01     65.97%      +0.0%  (insignificant)
RT     2.9   0.03      2.9   0.02     49.54%      +0.2%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     8.4   0.03      8.4   0.03     20.63%      +0.0%  (insignificant)
L2     8.1   0.01      8.1   0.01     19.00%      +0.0%  (insignificant)
M      7.4   0.00      7.4   0.00      7.43%      -0.0%  (insignificant)
HT     7.0   0.01      7.0   0.01     33.74%      +0.0%  (insignificant)
VT     7.0   0.01      7.0   0.01     46.73%      +0.0%  (insignificant)
R      6.8   0.01      6.8   0.01     53.98%      +0.0%  (insignificant)
RT     4.5   0.05      4.5   0.04     14.83%      +0.1%  (insignificant)

At most 5 outliers rejected per test per set.

Test of feature 'over_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    12.5   0.03     21.2   0.05    100.00%     +69.5%
L2    11.1   0.02     17.4   0.01    100.00%     +57.3%
M      9.4   0.00     13.6   0.00    100.00%     +45.1%
HT     8.5   0.02     12.2   0.02    100.00%     +43.0%
VT     8.4   0.02     11.9   0.02    100.00%     +41.7%
R      8.2   0.01     11.5   0.02    100.00%     +40.5%
RT     5.5   0.05      7.6   0.08    100.00%     +39.1%

At most 2 outliers rejected per test per set.

Test of feature 'src_0565_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   514.0   6.57    507.0   5.16     99.99%      -1.4%
L2    82.1   3.41     83.1   4.83     66.63%      +1.3%  (insignificant)
M    128.2   0.14    128.2   0.13     47.76%      -0.0%  (insignificant)
HT    49.9   0.37     49.7   0.36     95.44%      -0.4%  (insignificant)
VT    44.2   0.29     44.2   0.26      0.41%      -0.0%  (insignificant)
R     39.2   0.33     39.1   0.29     63.60%      -0.2%  (insignificant)
RT    12.5   0.31     12.4   0.26     67.37%      -0.6%  (insignificant)

At most 6 outliers rejected per test per set.

Test of feature 'src_1555_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    24.1   0.13     24.2   0.13     61.78%      +0.1%  (insignificant)
L2    18.9   0.39     18.7   0.19     96.50%      -0.9%  (insignificant)
M     20.0   0.02     20.0   0.03     60.83%      -0.0%  (insignificant)
HT    12.4   0.06     12.4   0.05     93.43%      +0.2%  (insignificant)
VT    12.3   0.06     12.3   0.06     77.64%      +0.2%  (insignificant)
R     11.7   0.05     11.7   0.05     29.04%      +0.0%  (insignificant)
RT     5.4   0.09      5.4   0.07     83.90%      +0.6%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'src_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   478.2   9.26    479.5   5.48     46.79%      +0.3%  (insignificant)
L2    58.5   4.14     58.2   4.11     21.77%      -0.5%  (insignificant)
M     87.4   0.13     87.4   0.12     62.93%      +0.0%  (insignificant)
HT    36.7   0.22     36.7   0.20     73.41%      +0.2%  (insignificant)
VT    32.8   0.19     32.8   0.21     35.67%      +0.1%  (insignificant)
R     31.2   0.19     31.2   0.18     68.76%      +0.2%  (insignificant)
RT    11.7   0.23     11.8   0.26     89.51%      +0.9%  (insignificant)

At most 2 outliers rejected per test per set.

******************

Before: patch 1+2
After: patch 1+2+3
Expectation: no performance changes

Test of feature 'in_8888_8'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    13.1   0.03     13.3   0.03    100.00%      +1.9%
L2     9.5   0.14      9.5   0.17     20.38%      +0.1%  (insignificant)
M      9.6   0.01      9.7   0.00     99.52%      +0.1%
HT     7.8   0.02      7.8   0.01     97.10%      -0.1%  (insignificant)
VT     7.7   0.02      7.7   0.01     99.05%      -0.1%
R      7.3   0.01      7.3   0.01      2.45%      -0.0%  (insignificant)
RT     4.0   0.04      4.0   0.05      1.26%      -0.0%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'in_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    19.2   0.08     19.1   0.06    100.00%      -0.6%
L2    15.4   0.52     15.4   0.45     38.75%      -0.4%  (insignificant)
M     12.8   0.00     12.8   0.00     98.05%      +0.0%  (insignificant)
HT    11.3   0.03     11.3   0.03     99.99%      -0.3%
VT    11.0   0.03     11.0   0.03     99.23%      -0.2%
R     10.7   0.03     10.7   0.02     97.90%      -0.1%  (insignificant)
RT     6.6   0.09      6.6   0.09      7.24%      +0.0%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'in_reverse_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    32.3   0.07     32.4   0.09     79.24%      +0.1%  (insignificant)
L2    17.9   0.62     17.9   0.48     18.75%      -0.2%  (insignificant)
M     16.9   0.02     16.9   0.02     49.64%      +0.0%  (insignificant)
HT    12.1   0.04     12.1   0.03     71.54%      -0.1%  (insignificant)
VT    11.8   0.04     11.8   0.02     99.96%      -0.3%
R     11.3   0.03     11.3   0.02     22.90%      -0.0%  (insignificant)
RT     6.1   0.10      6.1   0.07     77.15%      -0.4%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    38.0   0.13     38.0   0.14      8.75%      -0.0%  (insignificant)
L2    27.7   0.61     27.5   0.83     73.75%      -0.8%  (insignificant)
M     26.9   0.03     26.9   0.03      3.72%      -0.0%  (insignificant)
HT    14.9   0.05     14.9   0.06     15.08%      -0.0%  (insignificant)
VT    14.1   0.04     14.0   0.04     99.80%      -0.3%
R     15.1   0.06     15.1   0.06     46.46%      -0.1%  (insignificant)
RT     7.3   0.14      7.3   0.13     15.09%      +0.1%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_8_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.3   0.01      5.3   0.01    100.00%      -0.2%
L2     4.6   0.04      4.6   0.04     21.09%      -0.1%  (insignificant)
M      4.5   0.00      4.5   0.00     60.21%      +0.0%  (insignificant)
HT     3.9   0.01      3.9   0.01      0.83%      -0.0%  (insignificant)
VT     3.9   0.01      3.9   0.01     30.43%      +0.0%  (insignificant)
R      3.8   0.00      3.8   0.01     94.02%      +0.1%  (insignificant)
RT     2.3   0.01      2.3   0.02     99.87%      +0.6%

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.7   0.01      5.7   0.01     81.08%      +0.1%  (insignificant)
L2     5.0   0.04      5.0   0.04     64.46%      -0.2%  (insignificant)
M      4.8   0.00      4.8   0.00      6.04%      -0.0%  (insignificant)
HT     4.5   0.01      4.5   0.01    100.00%      -0.2%
VT     4.5   0.01      4.5   0.01    100.00%      -0.2%
R      4.4   0.01      4.4   0.01     97.94%      -0.1%  (insignificant)
RT     2.9   0.02      2.9   0.02     99.98%      -0.9%

At most 1 outliers rejected per test per set.

Test of feature 'over_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     8.4   0.03      8.4   0.02     31.01%      +0.0%  (insignificant)
L2     8.1   0.01      8.1   0.01     14.41%      +0.0%  (insignificant)
M      7.4   0.00      7.4   0.00     22.23%      +0.0%  (insignificant)
HT     7.0   0.01      7.0   0.01     97.96%      -0.1%  (insignificant)
VT     7.0   0.01      6.9   0.01     96.55%      -0.1%  (insignificant)
R      6.8   0.01      6.8   0.01     93.76%      -0.1%  (insignificant)
RT     4.5   0.04      4.6   0.05     94.47%      +0.5%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'over_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    21.2   0.05     21.2   0.05     41.70%      +0.0%  (insignificant)
L2    17.4   0.01     17.4   0.01    100.00%      -0.1%
M     13.6   0.00     13.6   0.00    100.00%      -0.0%
HT    12.2   0.02     12.2   0.04     73.85%      -0.1%  (insignificant)
VT    11.9   0.02     11.8   0.03     44.38%      -0.0%  (insignificant)
R     11.5   0.02     11.5   0.03     96.20%      -0.1%  (insignificant)
RT     7.6   0.08      7.6   0.11     62.74%      -0.3%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'src_0565_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   507.0   5.16    478.3   8.23    100.00%      -5.7%
L2    83.1   4.83     82.6   4.39     32.21%      -0.6%  (insignificant)
M    128.2   0.13    128.3   0.13     76.78%      +0.0%  (insignificant)
HT    49.7   0.36     49.6   0.34     76.47%      -0.2%  (insignificant)
VT    44.2   0.26     43.9   0.18    100.00%      -0.8%
R     39.1   0.29     39.1   0.30      4.50%      -0.0%  (insignificant)
RT    12.4   0.26     12.3   0.28     69.77%      -0.6%  (insignificant)

At most 6 outliers rejected per test per set.

Test of feature 'src_1555_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    24.2   0.13     25.4   0.12    100.00%      +5.1%
L2    18.7   0.19     18.9   0.40     99.19%      +1.2%
M     20.0   0.03     20.0   0.02     96.94%      +0.1%  (insignificant)
HT    12.4   0.05     12.4   0.04     96.56%      +0.2%  (insignificant)
VT    12.3   0.06     12.3   0.05     84.07%      +0.2%  (insignificant)
R     11.7   0.05     11.7   0.05     99.71%      +0.3%
RT     5.4   0.07      5.4   0.08      7.45%      +0.0%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'src_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   479.5   5.48    476.8   7.58     85.00%      -0.6%  (insignificant)
L2    58.2   4.11     58.7   4.27     39.42%      +1.0%  (insignificant)
M     87.4   0.12     87.4   0.13     15.61%      -0.0%  (insignificant)
HT    36.7   0.20     36.7   0.23     69.17%      -0.2%  (insignificant)
VT    32.8   0.21     32.7   0.22     98.61%      -0.4%  (insignificant)
R     31.2   0.18     31.2   0.16     77.16%      -0.2%  (insignificant)
RT    11.8   0.26     11.7   0.13     99.78%      -1.4%

At most 4 outliers rejected per test per set.

******************

Before: patch 1+2+3
After: patch 1+2+3+4
Expectation: performance improvement on over_n_8888

Test of feature 'in_8888_8'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    13.3   0.03     13.3   0.05     99.85%      -0.2%
L2     9.5   0.17      9.4   0.14     84.59%      -0.6%  (insignificant)
M      9.7   0.00      9.7   0.00     12.02%      -0.0%  (insignificant)
HT     7.8   0.01      7.8   0.02     29.74%      -0.0%  (insignificant)
VT     7.7   0.01      7.7   0.02     24.23%      -0.0%  (insignificant)
R      7.3   0.01      7.3   0.02     44.23%      +0.0%  (insignificant)
RT     4.0   0.05      4.0   0.05     79.26%      -0.4%  (insignificant)

At most 4 outliers rejected per test per set.

Test of feature 'in_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    19.1   0.06     19.1   0.07     57.49%      +0.1%  (insignificant)
L2    15.4   0.45     15.5   0.51     61.13%      +0.7%  (insignificant)
M     12.8   0.00     12.8   0.00     31.36%      -0.0%  (insignificant)
HT    11.3   0.03     11.3   0.03     36.66%      +0.0%  (insignificant)
VT    11.0   0.03     11.0   0.03     49.05%      +0.0%  (insignificant)
R     10.7   0.02     10.7   0.03     36.66%      +0.0%  (insignificant)
RT     6.6   0.09      6.6   0.11     31.99%      +0.2%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'in_reverse_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    32.4   0.09     32.4   0.10     80.57%      +0.1%  (insignificant)
L2    17.9   0.48     17.9   0.44     32.45%      +0.3%  (insignificant)
M     16.9   0.02     17.0   0.02     18.84%      +0.0%  (insignificant)
HT    12.1   0.03     12.0   0.03     49.37%      -0.0%  (insignificant)
VT    11.8   0.02     11.8   0.03      5.08%      +0.0%  (insignificant)
R     11.3   0.02     11.3   0.02     56.36%      +0.0%  (insignificant)
RT     6.1   0.07      6.1   0.10     44.32%      +0.2%  (insignificant)

At most 2 outliers rejected per test per set.

Test of feature 'over_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    38.0   0.14     38.1   0.12     49.85%      +0.1%  (insignificant)
L2    27.5   0.83     27.6   0.83     34.98%      +0.4%  (insignificant)
M     26.9   0.03     26.9   0.03     23.94%      +0.0%  (insignificant)
HT    14.9   0.06     14.9   0.06     18.90%      -0.0%  (insignificant)
VT    14.0   0.04     14.0   0.06     10.01%      -0.0%  (insignificant)
R     15.1   0.06     15.1   0.07     16.20%      +0.0%  (insignificant)
RT     7.3   0.13      7.3   0.17     40.59%      +0.3%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'over_8888_8_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.3   0.01      5.3   0.01     47.95%      -0.0%  (insignificant)
L2     4.6   0.04      4.6   0.03     35.24%      -0.1%  (insignificant)
M      4.5   0.00      4.5   0.00      5.89%      -0.0%  (insignificant)
HT     3.9   0.01      3.9   0.01     42.81%      -0.0%  (insignificant)
VT     3.9   0.01      3.9   0.01     80.10%      -0.1%  (insignificant)
R      3.8   0.01      3.8   0.01     68.22%      -0.0%  (insignificant)
RT     2.3   0.02      2.3   0.02     55.04%      -0.2%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_8888_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     5.7   0.01      5.7   0.01     91.56%      -0.1%  (insignificant)
L2     5.0   0.04      5.0   0.03     88.04%      +0.3%  (insignificant)
M      4.8   0.00      4.8   0.00     89.77%      +0.0%  (insignificant)
HT     4.5   0.01      4.5   0.01     87.14%      +0.1%  (insignificant)
VT     4.5   0.01      4.5   0.01     91.46%      +0.1%  (insignificant)
R      4.4   0.01      4.4   0.01     92.73%      +0.1%  (insignificant)
RT     2.9   0.02      2.9   0.03      4.10%      -0.0%  (insignificant)

At most 1 outliers rejected per test per set.

Test of feature 'over_n_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     8.4   0.02      8.3   0.03     98.07%      -0.2%  (insignificant)
L2     8.1   0.01      8.1   0.01     97.97%      +0.1%  (insignificant)
M      7.4   0.00      7.4   0.00     27.60%      +0.0%  (insignificant)
HT     7.0   0.01      7.0   0.01     52.87%      -0.0%  (insignificant)
VT     6.9   0.01      6.9   0.01      1.96%      +0.0%  (insignificant)
R      6.8   0.01      6.8   0.01     71.27%      -0.0%  (insignificant)
RT     4.6   0.05      4.5   0.04    100.00%      -1.2%

At most 4 outliers rejected per test per set.

Test of feature 'over_n_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    21.2   0.05     45.4   0.15    100.00%    +113.8%
L2    17.4   0.01     43.2   0.03    100.00%    +148.0%
M     13.6   0.00     42.4   0.02    100.00%    +211.4%
HT    12.2   0.04     25.4   0.13    100.00%    +109.1%
VT    11.8   0.03     22.3   0.09    100.00%     +88.5%
R     11.5   0.03     23.2   0.10    100.00%    +102.2%
RT     7.6   0.11     11.5   0.19    100.00%     +51.2%

At most 1 outliers rejected per test per set.

Test of feature 'src_0565_0565'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   478.3   8.23    479.2  12.82     23.88%      +0.2%  (insignificant)
L2    82.6   4.39     82.8   4.44     10.21%      +0.2%  (insignificant)
M    128.3   0.13    128.2   0.14     55.57%      -0.0%  (insignificant)
HT    49.6   0.34     49.7   0.27     81.52%      +0.2%  (insignificant)
VT    43.9   0.18     44.1   0.26     99.34%      +0.4%
R     39.1   0.30     39.1   0.31      1.40%      -0.0%  (insignificant)
RT    12.3   0.28     12.4   0.27     40.16%      +0.3%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'src_1555_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    25.4   0.12     25.4   0.13     10.64%      +0.0%  (insignificant)
L2    18.9   0.40     19.0   0.32     15.56%      +0.1%  (insignificant)
M     20.0   0.02     20.0   0.02     18.25%      -0.0%  (insignificant)
HT    12.4   0.04     12.4   0.03      6.74%      -0.0%  (insignificant)
VT    12.3   0.05     12.3   0.04     55.63%      +0.1%  (insignificant)
R     11.7   0.05     11.7   0.03     47.13%      +0.1%  (insignificant)
RT     5.4   0.08      5.4   0.07     14.83%      -0.1%  (insignificant)

At most 3 outliers rejected per test per set.

Test of feature 'src_8888_8888'

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1   476.8   7.58    477.9   5.47     45.12%      +0.2%  (insignificant)
L2    58.7   4.27     57.6   4.91     65.35%      -1.9%  (insignificant)
M     87.4   0.13     87.4   0.12     58.01%      -0.0%  (insignificant)
HT    36.7   0.23     36.6   0.16     63.26%      -0.1%  (insignificant)
VT    32.7   0.22     32.7   0.16     53.57%      -0.1%  (insignificant)
R     31.2   0.16     31.1   0.15     78.92%      -0.2%  (insignificant)
RT    11.7   0.13     11.7   0.22     31.55%      +0.2%  (insignificant)

At most 6 outliers rejected per test per set.