[cairo] Survey of polygon rasterization techniques
david at freetype.org
Tue Aug 7 12:06:34 PDT 2007
I've written a small program to compare the performance of various anti-aliased
rasterization algorithms this week-end. My goal was to *directly* compare the qualities
of each algorithm when it comes to generating 8-bit alpha/gray values, when they are
fed exactly the same data.
I have thus written several such rasterizers, plus a small program that would time
and compare their relative performance when fed with the scaled glyph outlines of a
given font file (yes, this is skewed, but easy to obtain for me)
the rasterizers that I've implemented so far are:
- a "dummy" rasterizer that doesn't do anything, used to time the setup/bezier work
- a simple edge-based "monochrome" rasterizer, used as a reference point
- an "analytical" rasterizer, equivalent to the FreeType "smooth renderer"
- Kiia Kallio's smart multi-sampling algorithm
- Tor Anderson's super-sampling algorithm
There are multiple variants of Kiia's and Tor's algorithms, corresponding to increasing
levels of optimization, and different grid dimensions.
I have only implemented the non-zero (a.k.a. winding) fill rule for three reasons:
- I'm interested in finding a better replacement for FreeType as well, and this must
deal with "non-zero" shapes all the time
- even/odd is seldom used in practice.
- the non-zero rule is required if you want to implement proper stroking without
prior tesselation anyway
technically speaking, the rasterizers are variant of the FreeType renderer. this is
intentional and was done for two reasons:
- code reuse for the setup / transformation / bezier segmentation / interface parts.
- allows me to hack the new rasterizers in the "ftview" program more or less easily
to check rendering results (pretty important for debugging !!)
the program is now available at the following address:
this archive contains all source files (rasterizers + comparison program), plus a Makefile to build it on Unix.
you'll need a FreeType 2 installation (with headers). Have a look at the README file to see how to build and run it.
Here are a few numbers, followed by some explanations:
(this was run on a 3.4 GHz Pentium D)
* loading all glyphs in "Times New Roman", at size 600, repeated 5 times:
"ftrasters -r 5 -s 600 /usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf"
loading 1320 glyphs = 0.040000 s
rasterizer time throughput global relative
with dummy : 0.030 s ( 220000 glyphs/s)
with analytical : 0.490 s ( 13469 glyphs/s) x 1.00 x 1.00
with mono1 : 0.200 s ( 33000 glyphs/s) x 0.41 x 0.37
with kiia32_1 : 3.650 s ( 1808 glyphs/s) x 7.45 x 7.87
with kiia32_2 : 3.180 s ( 2075 glyphs/s) x 6.49 x 6.85
with kiia32_3 : 3.050 s ( 2163 glyphs/s) x 6.22 x 6.57
with kiia32_4 : 3.700 s ( 1783 glyphs/s) x 7.55 x 7.98
with kiia32_5 : 3.520 s ( 1875 glyphs/s) x 7.18 x 7.59
with kiia32_6 : 3.360 s ( 1964 glyphs/s) x 6.86 x 7.24
with kiia32_7 : 1.310 s ( 5038 glyphs/s) x 2.67 x 2.78
with kiia16_7 : 0.930 s ( 7096 glyphs/s) x 1.90 x 1.96
with kiia8_7 : 0.640 s ( 10312 glyphs/s) x 1.31 x 1.33
with tor15x17_1 : 1.690 s ( 3905 glyphs/s) x 3.45 x 3.61
with tor15x17_2 : 1.460 s ( 4520 glyphs/s) x 2.98 x 3.11
with tor15x17_3 : 1.400 s ( 4714 glyphs/s) x 2.86 x 2.98
with tor15x17_4 : 1.430 s ( 4615 glyphs/s) x 2.92 x 3.04
with tor15x17_5 : 1.290 s ( 5116 glyphs/s) x 2.63 x 2.74
with tor15x17_6 : 1.290 s ( 5116 glyphs/s) x 2.63 x 2.74
with tor15x17_7 : 1.020 s ( 6470 glyphs/s) x 2.08 x 2.15
with tor16x16_7 : 0.990 s ( 6666 glyphs/s) x 2.02 x 2.09
with tor8x32_7 : 0.700 s ( 9428 glyphs/s) x 1.43 x 1.46
with tor8x8_7 : 0.660 s ( 10000 glyphs/s) x 1.35 x 1.37
* loading all glyphs in "Arial", at size 20, repeated 50 times:
"ftrasters -r 50 -s 20 /usr/share/fonts/truetype/msttcorefonts/Arial.ttf"
loading 1320 glyphs = 0.030000 s
rasterizer time throughput global relative
with dummy : 0.160 s ( 412500 glyphs/s)
with analytical : 0.690 s ( 95652 glyphs/s) x 1.00 x 1.00
with mono1 : 0.380 s ( 173684 glyphs/s) x 0.55 x 0.42
with kiia32_1 : 2.260 s ( 29203 glyphs/s) x 3.28 x 3.96
with kiia32_2 : 2.030 s ( 32512 glyphs/s) x 2.94 x 3.53
with kiia32_3 : 1.850 s ( 35675 glyphs/s) x 2.68 x 3.19
with kiia32_4 : 1.570 s ( 42038 glyphs/s) x 2.28 x 2.66
with kiia32_5 : 1.430 s ( 46153 glyphs/s) x 2.07 x 2.40
with kiia32_6 : 1.380 s ( 47826 glyphs/s) x 2.00 x 2.30
with kiia32_7 : 1.520 s ( 43421 glyphs/s) x 2.20 x 2.57
with kiia16_7 : 1.130 s ( 58407 glyphs/s) x 1.64 x 1.83
with kiia8_7 : 0.800 s ( 82500 glyphs/s) x 1.16 x 1.21
with tor15x17_1 : 2.370 s ( 27848 glyphs/s) x 3.43 x 4.17
with tor15x17_2 : 2.180 s ( 30275 glyphs/s) x 3.16 x 3.81
with tor15x17_3 : 2.070 s ( 31884 glyphs/s) x 3.00 x 3.60
with tor15x17_4 : 2.040 s ( 32352 glyphs/s) x 2.96 x 3.55
with tor15x17_5 : 1.510 s ( 43708 glyphs/s) x 2.19 x 2.55
with tor15x17_6 : 1.460 s ( 45205 glyphs/s) x 2.12 x 2.45
with tor15x17_7 : 1.080 s ( 61111 glyphs/s) x 1.57 x 1.74
with tor16x16_7 : 1.000 s ( 66000 glyphs/s) x 1.45 x 1.58
with tor8x32_7 : 0.810 s ( 81481 glyphs/s) x 1.17 x 1.23
with tor8x8_7 : 0.790 s ( 83544 glyphs/s) x 1.14 x 1.19
the "global" column corresponds to the slowdown relative to the reference rasterizer
("analytical", in this case). the "relative" column corresponds to slowdowns computed
by substracting the "dummy" timing to each global one, and is more indicative of the
relative performance of each algorithm, since it doesn't include setup/bezier time.
some rasterizers have several variants (_1, _2, etc... up to _7) which correspond to
increasing levels of optimization. For example, "tor15x7_1" is equivalent to the code
written by Tor Anderson for the Apparition library, except that the grid sample sizes
are hard-coded as macro constants. "tor15x7_7" is a bad-ass version that works a lot
faster by getting rid of most slow sorting operations. See comments in source code
for details :-)
kiia32_1 is the 32-samples version of Kiia's algorithm, implemented naively.
(corresponds to "PolygonA" in Kiia's sources)
kiia32_7 is the 32-samples version of Kiia's most-optimized "PolygonF" implementation
(as found on http://mlab.uiah.fi/~kkallio/antialiasing/)
kiia16_7 and kiia8_7 are respecitively 16- and 8-samples optimized version of Kiia's
the intermediate optimization levels are only provided for informational purposes.
note that some of the less optimized algorithm might eat memory like candy :-)
tor16x16 is a variant of Tor's algorithm that uses, surprisingly, 16x16 super-sampling
tor8x32 uses 8 vertical samples, and 32 horizontal ones, still giving us 256 gray levels
but at nearly twice the performance.
tor8x8, well, provides only 64 distinct gray levels and is even speedier.
there still may be bugs in there, though I've tried hard to get rid of them, it's also
probably possible to further optimize some of the code, but I think the returns are going
to be pretty small. So I'll try to draw some early conclusions out of this:
- first of all, I'm not going to replace FreeType's smooth renderer with something else yet.
it still leads the pack in terms of raw performance despite having some known issues with
self-intersecting paths. Fortunately these cases are a non-problem in the context of
a font engine.
on the other hand, for general graphics rendering, I'm pretty impressed by the performance
that can be achieved with multi- and super-sampling schemes. this is clearly better than
what my last experiments showed.
- "tor16x16_7" seems to lead if you do not want to sacrifice quality for performance. for
something that is capable of producing 256 truly distinct gray levels, this is pretty
I find this a very good thing because this algorithm is much simpler to implement and
understand than kiia32_7 (the heavily optimized multi-sampler), which is only capable
of producing 32 distinct gray levels.
- if one priviledges raw performance over quality (e.g. to generate animations), both
"kiia8_7" and "tor8x8_7" seem good choices also. my personal preference goes to the
super-sampling algorithm due to its simplicity, and the fact that it can produce
8 times more gray levels (64 versus 8)
- keep in mind though that this test is skewed towards rendering glyph shapes, which
do not necessarily correspond to general Cairo paths as seen by its rasterizer.
more work is needed to generalize these conclusions !
- and finally, I would *dearly* appreciate if Tor and Kiia could review the code
corresponding to their respective algorithms in order to spot any deficiency or
possible missing optimization.
I'm going to continue working on this next week-end. I'd like to get rid of the FreeType
dependency entirely, and be able to get Zack's polygon data (or any other sort of thing
like that) to better the comparison. A program being able to generate pictures to compare
the rendering results of each algorithm would be a good thing too.
I've started writing a pixman-style rasterizer but this takes time. However, from what
I've seen now, I hardly see any hope of decent performance out of it :-)
if you feel like it, feel free to look at the code, run it, and improve it, or ask
questions in this thread.
Hope this helps,
- David Turner
I have strictly no idea why these numbers do not reflect Kiia's benchmarks. But I've
always found fishy the fact that he is comparing whole frameworks rather than specific
algorithm. Isn't it strange that GDI+ in monochrome mode is significantly slower than
his multi-sampler, even though it's supposed to do a lot less computations ?
On Fri, 03 Aug 2007 02:18:24 +0300, "Kiia Kallio" <kkallio at uiah.fi> said:
> I added Zack's polygon data to my tests as well, here are the results
> in 640x480 window on 1.8 GHz P4). No scaling is applied to the data, it's
> in the same size as specified in the complex.data file.
> As with all the tests, these measure only the actual rendering call (not
> initalization times, buffer conversions etc.), i.e. path objects are
> created first,
> and then these are used in the rendering. Performance counters are used
> for measuring the timings.
> Milliseconds per frame (average of 200 rendered frames)
> 8x AA 16x AA 32x AA GDI+ GDI+ no AA
> Polygon 1 e/o fill 0.48246 0.65462 0.89985 1.99712 0.68161
> Polygon 1 n/z fill 0.73342 1.12507 1.61857 1.67469 0.60785
> Polygon 2 e/o fill 0.69489 0.82366 0.92642 1.62445 1.38746
> Polygon 2 n/z fill 0.72300 0.86064 1.00957 1.65394 1.33512
> Polygon 3 e/o fill 6.53302 7.59778 9.07006 14.13767 10.13539
> Polygon 3 n/z fill 7.10002 8.78048 10.86452 13.40872 10.20099
> Frames per second
> 8x AA 16x AA 32x AA GDI+ GDI+ no AA
> Polygon 1 e/o fill 2072.7 1527.6 1111.3 500.7 1467.1
> Polygon 1 n/z fill 1363.5 888.8 617.8 597.1 1645.2
> Polygon 2 e/o fill 1439.1 1214.1 1079.4 615.6 720.7
> Polygon 2 n/z fill 1383.1 1161.9 990.5 604.6 749.0
> Polygon 3 e/o fill 153.1 131.6 110.3 70.7 98.7
> Polygon 3 n/z fill 140.8 113.9 92.0 74.6 98.0
> Note that fps is calculated from time spent in the rendering function. In
> the FPS is clamped by the page flip speed.
> There is a new version available from
> that includes these datasets. The test binaries are also available as a
> package (compiled for windows). I also noticed some debug functionality
> was left enabled by accident in the previous release, so this version is
> somewhat faster...
> > Some quick unscientific benchmarks
> > ----------------------------
> > 1 2 3
> > --------------------------
> > libart 60 1 4
> > cairo 63 22 1
> > agg 94 76 17
> > qt(x11) 63 63 6
> > qt(sw) 34 166 2
> > -Jeff
> > _______________________________________________
> > cairo mailing list
> > cairo at cairographics.org
> > http://lists.cairographics.org/mailman/listinfo/cairo
> cairo mailing list
> cairo at cairographics.org
More information about the cairo