[cairo] Some performance investigations

Carl Worth cworth at cworth.org
Wed Nov 8 18:52:25 PST 2006


So if you've been watching the cairo-commit list, (doesn't
everybody?), then you've seen that some new test cases have recently
been added to the cairo test suite. One is a test called
"zrusin_another" which has the smallest polygon from the set of three
polygons that Zack Rusin posted here:

	http://ktown.kde.org/~zrusin/examples/cairorender.tar.bz2

He described this as "text" but it's not text as in data from a font,
but is instead a fairly complex polygon making the shape of the word
"another" as if drawn by hand, (with a variable width pen), and then
outlined.

This was the most interesting of his 3 polygons since it actually adds
something new compared to our current random polygons in the
tessellate test.

I've also added a test which was originally posted in the following
mozilla bug report:

	A very slow SVG file with <path>s
	https://bugzilla.mozilla.org/show_bug.cgi?id=332413

It's a map of the world drawn at 800x400 with 165 stroked and filled
outlines of countries. The countries range from 16 to 2834 vertices
each (median 104, mean 189). Some of those countries consist of
multiple polygons drawn as a single path. This is a very interesting
example for being a very-much real-world case, and also being very
slow in current cairo, (about 600ms on my laptop).

The new tessellator does even worse on this test case, (about 2000ms),
so that's definitely something more interesting than the random
256-vertex polygon I had in the test suite before for which the new
tessellator is 4x faster.

The system-wide statistical sampling profilers, (oprofile and
sysprof), aren't providing a lot of help at looking at what's going
on. The first interesting question is whether the new algorithm is
exhibiting the logarithmic behavior that's desired. So what's really
wanted there is function call counts, and these profilers don't give
that data.

I was about to embark down a painful road with gprof, but people
steered me toward "valgrind --tool=callgrind" along with kcachegrind,
and that's working quite well for me, (after wrestling KDE fonts into
submission for a bit).

On the tessellate-256 case I can see that the new tessellator is
indeed scaling much better. The old tessellator is doing about 3
million intersection computations while the new one is doing only 16
thousand. So we definitely have the algorithmic scalability fix we
were hoping for. Recall that the new tessellator is expected to be
O((n+k)log n) where n is edges and k is intersections. The
tessellate-256 case is random edges that intersect a _lot_ so it's
apparent that the old algorithm is not scaling well as k gets large.

In the case of our world map, k is in fact quite small, (country
borders are very rarely self-intersecting), so I think we're seeing
the scale factors come into play. And here, the new tessellator has
some nasty scale factors. The old one is doing intersections with a
couple of floating-point divisions while the new one is using pairs of
64-bit integers to represent 128-bit fixed-point values and doing
1-bit-at-a-time division calculations. This is a huge part of the
slowdown, and I'll work to eliminate that now, (I had mentioned once
before this 128-bit arithmetic would be causing problems).

Anyway, that's some of what's happening now. Other stuff coming along:

 * Daniel Amelang has some floating-point transformation improvements,
   (short-circuiting identity transforms), an improvement based on
   previous work.

 * Daniel also promised another "magic" fix to improve the text
   performance problems seen in text rendering on embedded systems.

 * I've got an idea for a lookup-table based rasterization approach
   that should help a lot once we get the tessellator off of the top
   of the profile

 * There's a nasty X performance bug with too-large of a mask when
   rendering trapezoids with XRenderCompositeTrapezoids, (it doesn't
   clip the mask size to the destination's size or clip region). We
   could use a test case for this, (constant size destination with
   an ever larger triangle being drawn). We should fix that in the
   server and also work around it with cairo-side mask allocation and
   XRenderAddTraps.

 * There's a performance bug when stroking rectilinear elements, (need
   test case: use cairo_rectangle;cairo_stroke to draw a nicely
   snapped single-pixel-wide line, then use cairo_rectangle twice and
   cairo_fill to get the same result but _much_ faster).

 * After all that stuff is in place, we need to explore generating
   better trapezoids from the tessellator, not just doing it
   faster. Or skip tessellation all together.

So lots of fun stuff coming, and I hope people will all pitch in where
they can. I also plan to make regular snapshots showcasing what we
have in place as we make progress on all of this.

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20061108/c4e08c5a/attachment.pgp


More information about the cairo mailing list