[cairo] [PATCH 2/2] gl: Remove GL fixed-function matrix usage.

Alexandros Frantzis alexandros.frantzis at linaro.org
Tue Feb 1 18:00:01 PST 2011


On Tue, Feb 01, 2011 at 11:37:49AM -0800, Eric Anholt wrote:
> On Tue, 1 Feb 2011 11:38:15 +0200, Alexandros Frantzis <alexandros.frantzis at linaro.org> wrote:
> > On Mon, Jan 31, 2011 at 08:20:00PM -0800, Eric Anholt wrote:
> > > No significant performance difference (though a simpler diff showed a
> > > 2% hit)
> > > +    double dest_x1 = x1, dest_y1 = y1, dest_x2 = x2, dest_y2 = y2;
> > > +
> > > +    cairo_matrix_transform_point (&ctx->dest_matrix, &dest_x1, &dest_y1);
> > > +    cairo_matrix_transform_point (&ctx->dest_matrix, &dest_x2, &dest_y2);
> > > +
> > 
> > Hi Eric,
> > 
> > my alternative take on this can be found in the WIP branch
> > 'gl-replace-builtin-shader-variables':
> > 
> > http://git.linaro.org/gitweb?p=people/afrantzis/cairo.git;a=shortlog;h=refs/heads/gl-replace-builtin-shader-variables
> > 
> > I am using a ModelViewProjectionMatrix custom uniform to pass the
> > transformation to the shader.
> > 
> > I am tempted to say that my approach may be more efficient, as the per
> > vertex tranformation is happening in the GPU instead of the CPU.
> > However, I haven't benchmarked this yet, so I 'll wait until I have more
> > information before I make any bold statements.
> 
> Uploading constants is relatively expensive (a compare to see if we need
> to upload, an extra WC copy to the constant buffer which is "slow", then
> the constant change actually getting pipelined through the hw), and we
> change source and destination a lot.  This also cuts the VS from 5
> instructions to 3 on 965, and it should be 2 if we did better VS
> optimization.  What I'm benchmarking is CPU bound, so I think I captured
> that, and for GPU-bound workloads this patch would be better than doing
> the math in the VS.
> 
> I was actually surprised I didn't find this patch to be a win -- it
> looks like it mostly to had to do with the double precision math being
> expensive.  Given that we're operating on limited-size surfaces, I think
> we could justify storing floating point matrices here and doing the math
> in floats.

Hi Eric,

in order to get a better view of the situation, I added some simple
instrumentation to cairo-gl. The results for sufficiently long runs of
various benchmark are:

                                                          vertices/update 
                     Vertices  CPU xforms Uniform updates (xforms/update)
firefox-talos-gfx    365825262 121941756  2262973         160 (53)        
gnome-system-monitor 126700506 42233504   47897           2645 (881)
evolution            7236340   2412114    86182           83 (28)
poppler              15121080  5040360    253440          60 (20)

Where:
    Vertices: How many vertices where emitted
    CPU xforms: the number of point transforms on the CPU with Eric's patch
	            (always ~3xVertices)
	Uniform updates: the number of ModelViewProjection uniform updates
	                 with my branch
	vertices/update: Ratio of vertices per uniform update
	xforms/update: Ratio of CPU transforms per uniform update



More information about the cairo mailing list