[Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags

Sun Sep 20 03:22:29 PDT 2015

On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> On Fri,  4 Sep 2015 03:09:20 +0100
> Ben Avison <bavison at riscosopen.org> wrote:
>
>> As discussed in
>> http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
>>
>> the 8 * pixman_fixed_e adjustment which was applied to the transformed
>> coordinates is a legacy of rounding errors which used to occur in old
>> versions of Pixman, but which no longer apply. For any affine transform,
>> you are now guaranteed to get the same result by transforming the upper
>> coordinate as though you transform the lower coordinate and add (size-1)
>> steps of the increment in source coordinate space. No projective
>> transform routines use the COVER_CLIP flags, so they cannot be affected.
>
> Hi all,
>
> as we doing these things not just for cleaning up but with the premise
> that there are missed optimization opportunities, I have benchmarked
> this patch series.
>
> The series as benchmarked is available at:
> https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1
>
> The benchmark points are:
>
> - baseline: "test: Add cover-test v5"
>
> - cleanup: "affine-bench: remove 8e margin from COVER area"
>         Includes the 8e extra safety margin removal.
>
> - tight: "pixman-fast-path: Make bilinear cover fetcher use
>         COVER_CLIP_TIGHT flag"
>         Includes all the COVER_CLIP_BILINEAR related patches from
>         Ben.
>
> Note, that ssse3_iters[] in pixman-ssse3.c still contains
> FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR.
>
> Cairo version is 1.14.2 for the benchmarks, which are run like:
> $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt
>
> I tried both "image" and "image16" on an x86_64 (Sandybridge), and got
> no performance differences in the trimmed-cairo-traces set in either
> baseline/cleanup or cleanup/tight.
>
> I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I
> verified I am really running what I think I am by editing Pixman and
> seeing the effect in the benchmark.
>
> Am I missing something?
>
> I thought we would see at least some improvements also on x86_64 when
> comparing cleanup/tight.
>
> Should I run the same on rpi2? Or is the best effect on the fast paths
> we haven't merged yet?
>
> I'd rather not run this on rpi1 due to the function address /
> performance quirk, doing the required iterations there would probably
> take too long and I'd need to rearrange the result files too.
>
> Or maybe our test set is not enough? I recall having some problems with
> that in the past.
>
> So, I patched Pixman to yell whenever TIGHT is set but
> COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and
> t-firefox-fishtank hit it with source image, each twice per iteration.
> Definitely seems like this test set is not hitting the cases we are
> interested in. I think I need to dig up our old performance profiles
> and see if we could record a trace from a real app that would hit these
> cases, now that Cairo's trace recording is supposedly fixed.
>
> The removal of the 8e extra safety margins shouldn't need performance
> profiles as justification, but for the tightening patches they'd be
> nice to have, especially since the usefulness of them has been
> questioned.
>
>
> Thanks,
> pq

Hi Pekka, Ben

I decided to also run the cairo trimmed benchmarks on my POWER8
ppc64le and POWER7 ppc64.
To make things clearer, I used the same definitions for "baseline",
"cleanup" and "tight".

I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4
I run the benchmarks doing (it's from inside a script):
"cairo-perf-trace benchmark -r -i8 > ../${__output}.perf"

First of all, diff between baseline/cleanup showed no change, in both
platforms, so that's good :)

Now, for cleanup/tight:

With POWER8 ppc64le, I got the following very modest boost:

image        t-firefox-asteroids  483.10 (523.85 3.49%) -> 452.84
(480.34 3.16%):  1.07x speedup
image       t-firefox-chalkboard  691.38 (692.09 0.06%) -> 653.07
(654.60 0.26%):  1.06x speedup

However, with POWER7 ppc64, I got the following regressions, which is quite bad:

image        t-firefox-asteroids  545.55 (559.64 1.79%) -> 781.07
(791.83 2.33%):  1.43x slowdown
image        t-firefox-scrolling  1185.45 (1186.02 0.05%) -> 1748.76
(1754.85 0.20%):  1.48x slowdown
image       t-firefox-chalkboard  1444.76 (1464.55 0.88%) -> 2315.76
(2333.10 0.34%):  1.60x slowdown
image        t-firefox-paintball  681.43 (682.28 0.10%) -> 1138.15
(1140.19 0.08%):  1.67x slowdown
image           t-firefox-canvas  890.14 (890.90 0.10%) -> 1492.83
(1493.51 0.20%):  1.68x slowdown
image  t-firefox-canvas-swscroll  1369.94 (1371.66 0.05%) -> 2297.53
(2305.70 0.18%):  1.68x slowdown
image        t-xfce4-terminal-a1  829.35 (832.39 0.16%) -> 1392.50
(1414.69 1.08%):  1.68x slowdown
image         t-firefox-fishbowl  3112.93 (3114.13 0.02%) -> 5227.18
(5229.05 0.03%):  1.68x slowdown
image                  t-poppler  404.14 (407.43 0.52%) -> 680.27
(685.01 0.45%):  1.68x slowdown
image        t-firefox-particles  3555.75 (3570.29 0.18%) -> 5990.93
(5995.00 0.05%):  1.68x slowdown
image            t-midori-zoomed  555.84 (557.29 0.24%) -> 936.56
(937.69 0.08%):  1.68x slowdown
image     t-gnome-system-monitor  844.70 (849.98 0.52%) -> 1426.26
(1427.60 0.12%):  1.69x slowdown
image     t-firefox-planet-gnome  904.60 (908.31 0.18%) -> 1527.90
(1530.03 0.08%):  1.69x slowdown
image            t-chromium-tabs  221.74 (221.87 0.04%) -> 374.75
(376.72 0.26%):  1.69x slowdown
image           t-swfdec-youtube  929.86 (930.31 0.12%) -> 1571.61
(1572.76 0.09%):  1.69x slowdown
image         t-firefox-fishtank  1787.33 (1787.36 0.00%) -> 3022.38
(3023.47 0.09%):  1.69x slowdown
image     t-firefox-canvas-alpha  1026.19 (1030.55 0.24%) -> 1735.63
(1740.84 0.28%):  1.69x slowdown
image                t-evolution  431.94 (433.98 0.36%) -> 731.76
(732.26 0.08%):  1.69x slowdown
image        t-firefox-talos-svg  1381.38 (1388.40 0.26%) -> 2342.68
(2345.83 0.10%):  1.70x slowdown
image                     t-gvim  803.40 (806.02 0.29%) -> 1363.80
(1366.63 0.27%):  1.70x slowdown
image           t-poppler-reseau  1416.96 (1443.14 0.74%) -> 2408.39
(2412.49 0.16%):  1.70x slowdown
image       t-swfdec-giant-steps  827.47 (829.87 0.17%) -> 1407.90
(1410.93 0.18%):  1.70x slowdown
image       t-gnome-terminal-vim  663.55 (669.39 0.71%) -> 1132.85
(1139.02 0.29%):  1.71x slowdown
image           t-grads-heat-map  225.85 (225.92 0.02%) -> 386.23
(386.78 0.49%):  1.71x slowdown

btw, out of curiosity, I checked cleanup/tight on my Haswell laptop
and I got mixed/bad results:

image           t-firefox-canvas  705.79 (869.04 11.16%) -> 563.55
(594.35 2.52%):  1.25x speedup

image           t-poppler-reseau  619.46 (881.17 16.35%) -> 657.98
(679.11 7.95%):  1.06x slowdown
image     t-firefox-planet-gnome  582.52 (605.63 1.82%) -> 627.80
(634.95 3.31%):  1.08x slowdown
image                t-evolution  264.55 (271.81 3.30%) -> 288.95
(336.86 11.37%):  1.09x slowdown
image       t-gnome-terminal-vim  264.74 (270.65 0.92%) -> 312.25
(516.79 20.96%):  1.18x slowdown
image           t-grads-heat-map   93.61 (93.92 0.23%) -> 115.32
(136.32 10.96%):  1.23x slowdown
image            t-chromium-tabs  115.36 (115.94 0.45%) -> 200.87
(254.77 11.90%):  1.74x slowdown

Opinions ?

           Oded