[cairo] Survey of polygon rasterization techniques

Wed Aug 1 03:45:55 PDT 2007

> >>> Zack Rusin <zack at kde.org> 08/01/07 10:32 AM >>>
> On Wednesday 01 August 2007 02:51:21 am Kiia Kallio wrote:
> > - I create all polygons as polygon objects (or whatever concept the
> >   underlying library uses) first and render with those (instead of feeding
> >   my polygon coordinates to the API over and over again in every frame).
> >   This way I'm really measuring the rendering performance, not just
> >   API overhead (like the Zack Rusin's QT tests do...)
> 
> My tests weren't testing rasterization performance so saying "my tests were a 
> lot better at testing rasterization performance than tests that weren't 
> testing rasterization performance" isn't the killer argument it could have 
> been ;) 
> 
> On a sidenote, there's a very good reason why I was feeding the coordinates - 
> if you feed data in the native structure used to hold geometrical data, then 
> a lot of geometrical properties can be cached (e.g. whether a polygon 
> intersects is not going to change under scale/translation/rotation, so 
> suddenly under those cases even tessellation becomes a O(n) process) and I 
> was not interested in smartness of caching but raw rendering speed as seen 
> from the top.

I have to disagree about the relevance of this test method if you want
to test "raw rendering speed". Especially with graphics HW the whole
system is built so that the application should keep as much of the
data as possible on the HW side. If the application re-feeds data that is
not really modified every frame, it's just doing things simply wrong and
causing unnecessary overhead everywhere.

There are use cases (like morphing) where the path data gets animated,
but in general in all vector graphics formats (such as Flash or SVG) the
animation is mainly done by modifying the transformations, not the path
itself. So if the players for these formats are implemented correctly, they
should let the vector graphics framework handle the path data and
update it only when the data changes, instead of re-feeding the same
data over and over again every frame.

Furthermore, for example OpenVG API has an interface for path morphing
as well. This enables for instance GPU or the driver to perform the morph
operations as efficiently as possible, so that the client application doesn't
have to ever modify the path data directly. (So for instance with SVG only
scripts that modify the paths would be the source of data that is really
modified on the client side.)

A realistic test suite should test both cases, but have a lot more emphasis
on the case where the path data doesn't change. (See for instance
http://www.futuremark.com/products/vgmark/vgmark10/ for a commercial
vector graphics test suite.)

You are testing the case where a lot of complex, dynamic, polygonal data
is fed to the API every frame. This is not "raw rendering speed", but a test
on the API overhead with dynamic polygonal data. This aspect in
performance is necessary (and a reason for irritating jerks in the
animation with tesselating 3D algorithms when the data is modified), but it's
not the whole truth about rendering performance. Stencil buffer based
approaches are far more inefficient than tesselation-based approaches
when comparing the rendering performance of static data, since they
consume more bandwidth and require dual pass rendering. As a trade-off
for the improved speed, tesselation-based algorithms have a longer setup
time - and the data you are feeding (very complex polygons) is just about
the worst case for these algorithms.

While waiting for the GPU that can rasterize complex polygons directly
instead of resorting to triangles, these are the options for HW accelerated
vector graphic rasterization, and neither of them is perfect. Your test favors
the stencil buffer based approach, and you can claim hundreds of percent
better SETUP performance than tesselation-based approach. What comes
to the "raw rendering speed" ... I would really like to see results from the
same set of tests were you create the data just once and then render
without re-feeding it.

Maybe it's a bit pointless to go into this discussion here, as pretty much
everything has already been said in the comments of your blog... :) I'm in
no favour of either of the approach - mostly because with current HW you
can't have proper antialiasing but are limited to multisampling with
relatively low amount of samples or shader-based antialiasing that has
undersampling bugs...

Maybe the overall outcome is just that yes, it's really difficult to come up
with tests that are relevant and unbiased. I have tried my best.;)

> As to the other points I agree with David. 
> The problem with algorithms for graphics /frameworks/ is that benchmarking is 
> difficult because no test can really accommodate the variety of uses from the 
> client side. The algorithm has to be generic enough to handle all the cases.
> Then there's the question of how the algorithm fits within the current 
> pipeline, i.e. what is the raw setup time necessary to move from the 
> framework's public api to its internal representation. 
> It is only in this broad spectrum that we can judge its performance, which 
> tends to be difficult unless one has actually integrated the algorithm in.

If you are evaluating which algorithm to use for a specific framework, it's of
course difficult to evaluate them if they are parts of different frameworks.
That's the reason why I attempted to create cases where all frameworks
need to resort to somewhat similar behaviour but still provide the best
performance they can achieve.

Still there can be things that are actually outside the scope of the algorithm -
for instance some libraries may use SSE operations for vertex transform,
which would give advantage over those that don't in a part of a pipeline
that's not relevant for the rasterization algorithm.

However, in order to detect differences in behaviour regarding things like
this, I've included some tests that render the exact same shape, but with
different characteristics regarding the vertex count and edge lengths. These
are the square tests, "basic", "many edges" and "long edges". "Basic" has as
many vertices as "long edges", but the total length of the edges is larger in
"long edges" (as they go round and round the shape). "Many edges" on the
other hand has the same total edge length as the "basic", but the edges are
shorter and there are lot of vertices. The results from these tests indicate
that there are no great differences in the transformation speeds of different
libraries.

> I certainly think your algorithm is extremely interesting (plus I love that 
> you actually provided benchmarking code for it, which is something that you 
> rarely see for research papers) and well worth the time needed to integrate 
> it and test how it performs in the global scope of the framework. 
> But when it comes to the final outcome, years of experience in implementing 
> research papers for graphics frameworks taught me to remain skeptic until the 
> end.

I completely agree about implementing the research papers, and that's
exactly the reason why I published the source code and the test framework.
I have long background in the industry and this academic work is more like
a hobby for me, so I've tried to address this more from the practical point
of view and create something that is really usable, instead of introducing yet
another algorithm that fails with the special cases (like the antialiasing in
Loop & Blinn's paper from Siggraph '05).

> z

Kiia