[Pixman] [PATCH 1/3] Add CLEAR and SRC linear interpolation operators

Fri Sep 30 02:23:07 PDT 2011

On Tue, 27 Sep 2011 12:51:51 +0200, sandmann at cs.au.dk (=?utf-8?Q?S=C3=B8ren?= Sandmann) wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Cairo, for instance, has a subtly different interpretation of how to use
> > the mask in combination with the Porter-Duff operators. In particular,
> > it has the notion of a clip mask, for which pixman has no parallel.
> 
> A question I have is, how much of a speedup is this really?
> 
> As mentioned earlier,
> 
>     http://lists.freedesktop.org/archives/cairo/2011-February/021686.html
> 
> I have some concerns about adding two special operators that operate in
> a totally different way than all the other operators. There are various
> places in pixman where we assume that source and mask are never used
> independently, for example in the operator optimization table, and in
> the general code, where this comment:
> 
>      "If it doesn't matter what the source is, then it doesn't matter
>       what the mask is",
> 
> would no longer really be true if the LERP operators are added. Nothing
> in the patches actually cause these places to malfunction, but they do
> make the code base less regular and therefore more difficult to
> maintain. If the speedup on cairo traces is big enough, maybe that is
> enough to justify it though.
> 
> So basically, I'd like to see some performance measurements. With the
> patches as posted, the C fast paths will be selected ahead of the
> general path, which is not necessarily the fastest:
> 
>    http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00887.html
> 
> Interesting benchmarks include:
> 
> - Performance with just the C fast paths
> 
> - Performance with just the SSE2 combiner (a fixed one, see below)
> 
> - Performance with both
> 
> - How much faster is the fastest of the above than with no-LERP?
> 
> The SSE2 combiner looks to me like it is missing an expand_alpha(), so
> if you are going to do these measurements, it would be useful to add
> support for the LERP operators to blitters test as a commit before
> adding the fast paths, to verify that they actually work. It may be
> useful to add them to some of the other tests as well.

I did add the support to the composite test in the first commit. It failed
to highlight the error in the sse2 code nd it wasn't until digging through
cairo-test-suite failures I discovered the missing expand_alpha. I missed
adding support to the blitters test, though.

Anyway I left my little Atom running all week running through the traces
with the various commits. Summary, these paths only get exercised in
certain conditions and then it is the reduction of the Cairo fallback pass
that is significant.

general:
old: no-lerp-20110927
new: general-lerp-20110928
Speedups
========
image       www.cracked.com.1.75  551.69 (552.50 0.08%) -> 477.76 (478.76 0.11%):  1.15x speedup
image           www.tmz.com.1.75  622.16 (622.52 0.03%) -> 541.00 (541.95 0.08%):  1.15x speedup
image         www.wretch.cc.1.75  694.50 (694.72 0.06%) -> 606.51 (606.91 0.10%):  1.15x speedup
image   www.sueddeutsche.de.1.75  484.25 (484.68 0.08%) -> 425.60 (426.41 0.08%):  1.14x speedup
image www.washingtonpost.com.1.75  467.34 (467.63 0.06%) -> 416.52 (416.89 0.13%):  1.12x speedup
image       www.gumtree.com.1.75  745.98 (746.84 0.07%) -> 668.40 (668.68 0.06%):  1.12x speedup
image     www.over-blog.com.1.75  707.45 (708.55 0.10%) -> 652.95 (653.26 0.10%):  1.08x speedup
image        www.ustream.tv.1.75  1844.09 (1846.40 0.07%) -> 1711.53 (1711.73 0.02%):  1.08x speedup
image      www.suite101.com.1.75  320.08 (320.63 0.10%) -> 297.45 (297.95 0.10%):  1.08x speedup
image           www.ign.com.1.75  925.74 (926.14 0.04%) -> 861.37 (861.68 0.03%):  1.07x speedup
image       www.cbsnews.com.1.75  585.72 (586.45 0.12%) -> 545.63 (545.98 0.04%):  1.07x speedup
image     www.chinanews.com.1.75  1249.95 (1250.34 0.02%) -> 1176.69 (1178.58 0.22%):  1.06x speedup
image        webkit-canvas-alpha  85958.07 (86140.35 0.17%) -> 81177.93 (81253.00 0.07%):  1.06x speedup

old: general-lerp-20110928
new: fast-lerp-20110928
No speedups

old: fast-lerp-20110928
new: sse2-lerp-20110929
No speedups

I redid the baseline measurement of no-lerp and verified that result.
Other than using the corrected form of the sse2 combiner, I did not apply
your suggested enhancements - which may be enough to make the improvement
in the trace detectable. More likely that profiling will give something
else to worry about. However, reducing one pass in Cairo is clearly
beneficial.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre