[Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags

Sun Sep 20 05:47:01 PDT 2015

On Sun, Sep 20, 2015 at 1:22 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
> On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>> On Fri,  4 Sep 2015 03:09:20 +0100
>> Ben Avison <bavison at riscosopen.org> wrote:
>>
>>> As discussed in
>>> http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
>>>
>>> the 8 * pixman_fixed_e adjustment which was applied to the transformed
>>> coordinates is a legacy of rounding errors which used to occur in old
>>> versions of Pixman, but which no longer apply. For any affine transform,
>>> you are now guaranteed to get the same result by transforming the upper
>>> coordinate as though you transform the lower coordinate and add (size-1)
>>> steps of the increment in source coordinate space. No projective
>>> transform routines use the COVER_CLIP flags, so they cannot be affected.
>>
>> Hi all,
>>
>> as we doing these things not just for cleaning up but with the premise
>> that there are missed optimization opportunities, I have benchmarked
>> this patch series.
>>
>> The series as benchmarked is available at:
>> https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1
>>
>> The benchmark points are:
>>
>> - baseline: "test: Add cover-test v5"
>>
>> - cleanup: "affine-bench: remove 8e margin from COVER area"
>>         Includes the 8e extra safety margin removal.
>>
>> - tight: "pixman-fast-path: Make bilinear cover fetcher use
>>         COVER_CLIP_TIGHT flag"
>>         Includes all the COVER_CLIP_BILINEAR related patches from
>>         Ben.
>>
>> Note, that ssse3_iters[] in pixman-ssse3.c still contains
>> FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR.
>>
>> Cairo version is 1.14.2 for the benchmarks, which are run like:
>> $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt
>>
>> I tried both "image" and "image16" on an x86_64 (Sandybridge), and got
>> no performance differences in the trimmed-cairo-traces set in either
>> baseline/cleanup or cleanup/tight.
>>
>> I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I
>> verified I am really running what I think I am by editing Pixman and
>> seeing the effect in the benchmark.
>>
>> Am I missing something?
>>
>> I thought we would see at least some improvements also on x86_64 when
>> comparing cleanup/tight.
>>
>> Should I run the same on rpi2? Or is the best effect on the fast paths
>> we haven't merged yet?
>>
>> I'd rather not run this on rpi1 due to the function address /
>> performance quirk, doing the required iterations there would probably
>> take too long and I'd need to rearrange the result files too.
>>
>> Or maybe our test set is not enough? I recall having some problems with
>> that in the past.
>>
>> So, I patched Pixman to yell whenever TIGHT is set but
>> COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and
>> t-firefox-fishtank hit it with source image, each twice per iteration.
>> Definitely seems like this test set is not hitting the cases we are
>> interested in. I think I need to dig up our old performance profiles
>> and see if we could record a trace from a real app that would hit these
>> cases, now that Cairo's trace recording is supposedly fixed.
>>
>> The removal of the 8e extra safety margins shouldn't need performance
>> profiles as justification, but for the tightening patches they'd be
>> nice to have, especially since the usefulness of them has been
>> questioned.
>>
>>
>> Thanks,
>> pq
>
> Hi Pekka, Ben
>
> I decided to also run the cairo trimmed benchmarks on my POWER8
> ppc64le and POWER7 ppc64.
> To make things clearer, I used the same definitions for "baseline",
> "cleanup" and "tight".
>
> I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4
> I run the benchmarks doing (it's from inside a script):
> "cairo-perf-trace benchmark -r -i8 > ../${__output}.perf"
>
> First of all, diff between baseline/cleanup showed no change, in both
> platforms, so that's good :)
>
> Now, for cleanup/tight:
>
> With POWER8 ppc64le, I got the following very modest boost:
>
> image        t-firefox-asteroids  483.10 (523.85 3.49%) -> 452.84
> (480.34 3.16%):  1.07x speedup
> image       t-firefox-chalkboard  691.38 (692.09 0.06%) -> 653.07
> (654.60 0.26%):  1.06x speedup
>
> However, with POWER7 ppc64, I got the following regressions, which is quite bad:
>
> image        t-firefox-asteroids  545.55 (559.64 1.79%) -> 781.07
> (791.83 2.33%):  1.43x slowdown
> image        t-firefox-scrolling  1185.45 (1186.02 0.05%) -> 1748.76
> (1754.85 0.20%):  1.48x slowdown
> image       t-firefox-chalkboard  1444.76 (1464.55 0.88%) -> 2315.76
> (2333.10 0.34%):  1.60x slowdown
> image        t-firefox-paintball  681.43 (682.28 0.10%) -> 1138.15
> (1140.19 0.08%):  1.67x slowdown
> image           t-firefox-canvas  890.14 (890.90 0.10%) -> 1492.83
> (1493.51 0.20%):  1.68x slowdown
> image  t-firefox-canvas-swscroll  1369.94 (1371.66 0.05%) -> 2297.53
> (2305.70 0.18%):  1.68x slowdown
> image        t-xfce4-terminal-a1  829.35 (832.39 0.16%) -> 1392.50
> (1414.69 1.08%):  1.68x slowdown
> image         t-firefox-fishbowl  3112.93 (3114.13 0.02%) -> 5227.18
> (5229.05 0.03%):  1.68x slowdown
> image                  t-poppler  404.14 (407.43 0.52%) -> 680.27
> (685.01 0.45%):  1.68x slowdown
> image        t-firefox-particles  3555.75 (3570.29 0.18%) -> 5990.93
> (5995.00 0.05%):  1.68x slowdown
> image            t-midori-zoomed  555.84 (557.29 0.24%) -> 936.56
> (937.69 0.08%):  1.68x slowdown
> image     t-gnome-system-monitor  844.70 (849.98 0.52%) -> 1426.26
> (1427.60 0.12%):  1.69x slowdown
> image     t-firefox-planet-gnome  904.60 (908.31 0.18%) -> 1527.90
> (1530.03 0.08%):  1.69x slowdown
> image            t-chromium-tabs  221.74 (221.87 0.04%) -> 374.75
> (376.72 0.26%):  1.69x slowdown
> image           t-swfdec-youtube  929.86 (930.31 0.12%) -> 1571.61
> (1572.76 0.09%):  1.69x slowdown
> image         t-firefox-fishtank  1787.33 (1787.36 0.00%) -> 3022.38
> (3023.47 0.09%):  1.69x slowdown
> image     t-firefox-canvas-alpha  1026.19 (1030.55 0.24%) -> 1735.63
> (1740.84 0.28%):  1.69x slowdown
> image                t-evolution  431.94 (433.98 0.36%) -> 731.76
> (732.26 0.08%):  1.69x slowdown
> image        t-firefox-talos-svg  1381.38 (1388.40 0.26%) -> 2342.68
> (2345.83 0.10%):  1.70x slowdown
> image                     t-gvim  803.40 (806.02 0.29%) -> 1363.80
> (1366.63 0.27%):  1.70x slowdown
> image           t-poppler-reseau  1416.96 (1443.14 0.74%) -> 2408.39
> (2412.49 0.16%):  1.70x slowdown
> image       t-swfdec-giant-steps  827.47 (829.87 0.17%) -> 1407.90
> (1410.93 0.18%):  1.70x slowdown
> image       t-gnome-terminal-vim  663.55 (669.39 0.71%) -> 1132.85
> (1139.02 0.29%):  1.71x slowdown
> image           t-grads-heat-map  225.85 (225.92 0.02%) -> 386.23
> (386.78 0.49%):  1.71x slowdown
>
> btw, out of curiosity, I checked cleanup/tight on my Haswell laptop
> and I got mixed/bad results:
>
> image           t-firefox-canvas  705.79 (869.04 11.16%) -> 563.55
> (594.35 2.52%):  1.25x speedup
>
> image           t-poppler-reseau  619.46 (881.17 16.35%) -> 657.98
> (679.11 7.95%):  1.06x slowdown
> image     t-firefox-planet-gnome  582.52 (605.63 1.82%) -> 627.80
> (634.95 3.31%):  1.08x slowdown
> image                t-evolution  264.55 (271.81 3.30%) -> 288.95
> (336.86 11.37%):  1.09x slowdown
> image       t-gnome-terminal-vim  264.74 (270.65 0.92%) -> 312.25
> (516.79 20.96%):  1.18x slowdown
> image           t-grads-heat-map   93.61 (93.92 0.23%) -> 115.32
> (136.32 10.96%):  1.23x slowdown
> image            t-chromium-tabs  115.36 (115.94 0.45%) -> 200.87
> (254.77 11.90%):  1.74x slowdown
>
> Opinions ?
>
>            Oded

Please disregard the email above - the results there are bogus because
my server is inside a VM!!!

After I sent my email, I run the cleanup version 5 times in a row. The
first 4 times were identical, but the 5th time showed major slowdown.

I also run the tight version 5 times in a row. The 2nd run showed
major improvement over the 1st run, the 3rd run showed an additional
improvement on top of that, and the 4th and 5th runs were identical to
the 3rd run.

On the one hand, I'm not running anything else on this server. On the
other hand, this is a VM, so maybe the host machine is
over-subscribed.

I then went to test it on a physical server without VM (the ppc64
version). I run cleanup and tight 5 times, and all results were
identical. So I think the issue is definitely with the VM.

And as for the results, I'm happy to say that there is no change
between cleanup and tight :)

        Oded