[cairo] Some performance investigations

Carl Worth cworth at cworth.org
Thu Nov 9 11:17:18 PST 2006


On Thu, 9 Nov 2006 11:28:47 -0700, "chris nuernberger" wrote:
>
> Are you sticking with the 128 fixed point math?

Not for common cases, no. But thanks for the suggestions.

I implemented 64-bit intersection when possible, (input coordinates
have to fit within 20 bits for that, since the intersection requires
3x+2 the number of bits for accuracy). To make it common that
coordinates fit in 20 bits I've switched cairo from using 16.16 to
using 24.8 fixed-point values.

With those two changes the world map case with the new tessellator got
twice as fast, (intersection completely disappeared from the
profiles), but it was still twice as slow as with the old tessellator.

The callgraphs had some interesting numbers in it[*] with the
tessellator and skip lists really doing a lot of work, (61k polygons
to tessellate resulting in 1M skip list insertions and over 3.5M calls
to the comparison function backing the event queue skip list).

That all seemed really excessive, so I looked closer at what these
61k polygons looked like, (recall that we're working with only a
couple of hundred country polygons originally). The median vertex
count of the polygons was telling: 4.

So it turns out the stroke algorithm was calling into the tessellator
for every rectangle (or parallelogram if sheared) formed by stroking a
line segment. It was also calling into the tessellator for the
quadrilateral formed by every miter join, (never mind the fact that
with a 0.2 pixel line width the joins aren't at visible anyway).

We already had custom code for tessellating a rectangle sitting there,
so I switched the line segment stroking to just do that, and also just
turned off all joins to see what would happen. With those changes on
top of the other stuff I'm now getting the map drawn in 397ms compared
to the original 600ms (or so) for a speedup of about 1.5x.

That's not a totally fair comparison though, as these last couple of
optimizations could help the old code too, so I'll still try that.

At this point, rasterization is starting to show up on the profile,
(about 30% according to callgrind), so the easy idea I have for a
fast, lookup-table based rasterizer will probably start to be
interesting.

But even if that rasterization time went to zero, 287 milliseconds is
way too long to take to draw something like this map. Oh, did I even
show the list yet what the image looks like? Here is the SVG file I
started with and a png of cairo's rendering of it:

	http://cairographics.org/~cworth/images/worldmap.svg
	http://cairographics.org/~cworth/images/worldmap.png

Anyway, I think we really should be able to render that a heck of a
lot faster still. I'll keep thinking about this and whacking on it,
(and I'll get the code mentioned above cleaned up and pushed out as
soon as possible).

-Carl

[*] I'm really loving "valgrind --tool=callgrind" now---there's so
much information that's good for algorithmic analysis that the
statistical sampling profilers just don't capture. (And there's a lot
of stuff that callgrind can't capture like system-wide, multi-process
effects.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20061109/2797973c/attachment.pgp


More information about the cairo mailing list