[Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
Oded Gabbay
oded.gabbay at gmail.com
Sun Sep 20 05:47:01 PDT 2015
On Sun, Sep 20, 2015 at 1:22 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
> On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>> On Fri, 4 Sep 2015 03:09:20 +0100
>> Ben Avison <bavison at riscosopen.org> wrote:
>>
>>> As discussed in
>>> http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
>>>
>>> the 8 * pixman_fixed_e adjustment which was applied to the transformed
>>> coordinates is a legacy of rounding errors which used to occur in old
>>> versions of Pixman, but which no longer apply. For any affine transform,
>>> you are now guaranteed to get the same result by transforming the upper
>>> coordinate as though you transform the lower coordinate and add (size-1)
>>> steps of the increment in source coordinate space. No projective
>>> transform routines use the COVER_CLIP flags, so they cannot be affected.
>>
>> Hi all,
>>
>> as we doing these things not just for cleaning up but with the premise
>> that there are missed optimization opportunities, I have benchmarked
>> this patch series.
>>
>> The series as benchmarked is available at:
>> https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1
>>
>> The benchmark points are:
>>
>> - baseline: "test: Add cover-test v5"
>>
>> - cleanup: "affine-bench: remove 8e margin from COVER area"
>> Includes the 8e extra safety margin removal.
>>
>> - tight: "pixman-fast-path: Make bilinear cover fetcher use
>> COVER_CLIP_TIGHT flag"
>> Includes all the COVER_CLIP_BILINEAR related patches from
>> Ben.
>>
>> Note, that ssse3_iters[] in pixman-ssse3.c still contains
>> FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR.
>>
>> Cairo version is 1.14.2 for the benchmarks, which are run like:
>> $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt
>>
>> I tried both "image" and "image16" on an x86_64 (Sandybridge), and got
>> no performance differences in the trimmed-cairo-traces set in either
>> baseline/cleanup or cleanup/tight.
>>
>> I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I
>> verified I am really running what I think I am by editing Pixman and
>> seeing the effect in the benchmark.
>>
>> Am I missing something?
>>
>> I thought we would see at least some improvements also on x86_64 when
>> comparing cleanup/tight.
>>
>> Should I run the same on rpi2? Or is the best effect on the fast paths
>> we haven't merged yet?
>>
>> I'd rather not run this on rpi1 due to the function address /
>> performance quirk, doing the required iterations there would probably
>> take too long and I'd need to rearrange the result files too.
>>
>> Or maybe our test set is not enough? I recall having some problems with
>> that in the past.
>>
>> So, I patched Pixman to yell whenever TIGHT is set but
>> COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and
>> t-firefox-fishtank hit it with source image, each twice per iteration.
>> Definitely seems like this test set is not hitting the cases we are
>> interested in. I think I need to dig up our old performance profiles
>> and see if we could record a trace from a real app that would hit these
>> cases, now that Cairo's trace recording is supposedly fixed.
>>
>> The removal of the 8e extra safety margins shouldn't need performance
>> profiles as justification, but for the tightening patches they'd be
>> nice to have, especially since the usefulness of them has been
>> questioned.
>>
>>
>> Thanks,
>> pq
>
> Hi Pekka, Ben
>
> I decided to also run the cairo trimmed benchmarks on my POWER8
> ppc64le and POWER7 ppc64.
> To make things clearer, I used the same definitions for "baseline",
> "cleanup" and "tight".
>
> I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4
> I run the benchmarks doing (it's from inside a script):
> "cairo-perf-trace benchmark -r -i8 > ../${__output}.perf"
>
> First of all, diff between baseline/cleanup showed no change, in both
> platforms, so that's good :)
>
> Now, for cleanup/tight:
>
> With POWER8 ppc64le, I got the following very modest boost:
>
> image t-firefox-asteroids 483.10 (523.85 3.49%) -> 452.84
> (480.34 3.16%): 1.07x speedup
> image t-firefox-chalkboard 691.38 (692.09 0.06%) -> 653.07
> (654.60 0.26%): 1.06x speedup
>
> However, with POWER7 ppc64, I got the following regressions, which is quite bad:
>
> image t-firefox-asteroids 545.55 (559.64 1.79%) -> 781.07
> (791.83 2.33%): 1.43x slowdown
> image t-firefox-scrolling 1185.45 (1186.02 0.05%) -> 1748.76
> (1754.85 0.20%): 1.48x slowdown
> image t-firefox-chalkboard 1444.76 (1464.55 0.88%) -> 2315.76
> (2333.10 0.34%): 1.60x slowdown
> image t-firefox-paintball 681.43 (682.28 0.10%) -> 1138.15
> (1140.19 0.08%): 1.67x slowdown
> image t-firefox-canvas 890.14 (890.90 0.10%) -> 1492.83
> (1493.51 0.20%): 1.68x slowdown
> image t-firefox-canvas-swscroll 1369.94 (1371.66 0.05%) -> 2297.53
> (2305.70 0.18%): 1.68x slowdown
> image t-xfce4-terminal-a1 829.35 (832.39 0.16%) -> 1392.50
> (1414.69 1.08%): 1.68x slowdown
> image t-firefox-fishbowl 3112.93 (3114.13 0.02%) -> 5227.18
> (5229.05 0.03%): 1.68x slowdown
> image t-poppler 404.14 (407.43 0.52%) -> 680.27
> (685.01 0.45%): 1.68x slowdown
> image t-firefox-particles 3555.75 (3570.29 0.18%) -> 5990.93
> (5995.00 0.05%): 1.68x slowdown
> image t-midori-zoomed 555.84 (557.29 0.24%) -> 936.56
> (937.69 0.08%): 1.68x slowdown
> image t-gnome-system-monitor 844.70 (849.98 0.52%) -> 1426.26
> (1427.60 0.12%): 1.69x slowdown
> image t-firefox-planet-gnome 904.60 (908.31 0.18%) -> 1527.90
> (1530.03 0.08%): 1.69x slowdown
> image t-chromium-tabs 221.74 (221.87 0.04%) -> 374.75
> (376.72 0.26%): 1.69x slowdown
> image t-swfdec-youtube 929.86 (930.31 0.12%) -> 1571.61
> (1572.76 0.09%): 1.69x slowdown
> image t-firefox-fishtank 1787.33 (1787.36 0.00%) -> 3022.38
> (3023.47 0.09%): 1.69x slowdown
> image t-firefox-canvas-alpha 1026.19 (1030.55 0.24%) -> 1735.63
> (1740.84 0.28%): 1.69x slowdown
> image t-evolution 431.94 (433.98 0.36%) -> 731.76
> (732.26 0.08%): 1.69x slowdown
> image t-firefox-talos-svg 1381.38 (1388.40 0.26%) -> 2342.68
> (2345.83 0.10%): 1.70x slowdown
> image t-gvim 803.40 (806.02 0.29%) -> 1363.80
> (1366.63 0.27%): 1.70x slowdown
> image t-poppler-reseau 1416.96 (1443.14 0.74%) -> 2408.39
> (2412.49 0.16%): 1.70x slowdown
> image t-swfdec-giant-steps 827.47 (829.87 0.17%) -> 1407.90
> (1410.93 0.18%): 1.70x slowdown
> image t-gnome-terminal-vim 663.55 (669.39 0.71%) -> 1132.85
> (1139.02 0.29%): 1.71x slowdown
> image t-grads-heat-map 225.85 (225.92 0.02%) -> 386.23
> (386.78 0.49%): 1.71x slowdown
>
> btw, out of curiosity, I checked cleanup/tight on my Haswell laptop
> and I got mixed/bad results:
>
> image t-firefox-canvas 705.79 (869.04 11.16%) -> 563.55
> (594.35 2.52%): 1.25x speedup
>
> image t-poppler-reseau 619.46 (881.17 16.35%) -> 657.98
> (679.11 7.95%): 1.06x slowdown
> image t-firefox-planet-gnome 582.52 (605.63 1.82%) -> 627.80
> (634.95 3.31%): 1.08x slowdown
> image t-evolution 264.55 (271.81 3.30%) -> 288.95
> (336.86 11.37%): 1.09x slowdown
> image t-gnome-terminal-vim 264.74 (270.65 0.92%) -> 312.25
> (516.79 20.96%): 1.18x slowdown
> image t-grads-heat-map 93.61 (93.92 0.23%) -> 115.32
> (136.32 10.96%): 1.23x slowdown
> image t-chromium-tabs 115.36 (115.94 0.45%) -> 200.87
> (254.77 11.90%): 1.74x slowdown
>
> Opinions ?
>
> Oded
Please disregard the email above - the results there are bogus because
my server is inside a VM!!!
After I sent my email, I run the cleanup version 5 times in a row. The
first 4 times were identical, but the 5th time showed major slowdown.
I also run the tight version 5 times in a row. The 2nd run showed
major improvement over the 1st run, the 3rd run showed an additional
improvement on top of that, and the 4th and 5th runs were identical to
the 3rd run.
On the one hand, I'm not running anything else on this server. On the
other hand, this is a VM, so maybe the host machine is
over-subscribed.
I then went to test it on a physical server without VM (the ppc64
version). I run cleanup and tight 5 times, and all results were
identical. So I think the issue is definitely with the VM.
And as for the results, I'm happy to say that there is no change
between cleanup and tight :)
Oded
More information about the Pixman
mailing list