[cairo] [PATCH 2/2] gl: Remove GL fixed-function matrix usage.
Alexandros Frantzis
alexandros.frantzis at linaro.org
Tue Feb 1 18:00:01 PST 2011
On Tue, Feb 01, 2011 at 11:37:49AM -0800, Eric Anholt wrote:
> On Tue, 1 Feb 2011 11:38:15 +0200, Alexandros Frantzis <alexandros.frantzis at linaro.org> wrote:
> > On Mon, Jan 31, 2011 at 08:20:00PM -0800, Eric Anholt wrote:
> > > No significant performance difference (though a simpler diff showed a
> > > 2% hit)
> > > + double dest_x1 = x1, dest_y1 = y1, dest_x2 = x2, dest_y2 = y2;
> > > +
> > > + cairo_matrix_transform_point (&ctx->dest_matrix, &dest_x1, &dest_y1);
> > > + cairo_matrix_transform_point (&ctx->dest_matrix, &dest_x2, &dest_y2);
> > > +
> >
> > Hi Eric,
> >
> > my alternative take on this can be found in the WIP branch
> > 'gl-replace-builtin-shader-variables':
> >
> > http://git.linaro.org/gitweb?p=people/afrantzis/cairo.git;a=shortlog;h=refs/heads/gl-replace-builtin-shader-variables
> >
> > I am using a ModelViewProjectionMatrix custom uniform to pass the
> > transformation to the shader.
> >
> > I am tempted to say that my approach may be more efficient, as the per
> > vertex tranformation is happening in the GPU instead of the CPU.
> > However, I haven't benchmarked this yet, so I 'll wait until I have more
> > information before I make any bold statements.
>
> Uploading constants is relatively expensive (a compare to see if we need
> to upload, an extra WC copy to the constant buffer which is "slow", then
> the constant change actually getting pipelined through the hw), and we
> change source and destination a lot. This also cuts the VS from 5
> instructions to 3 on 965, and it should be 2 if we did better VS
> optimization. What I'm benchmarking is CPU bound, so I think I captured
> that, and for GPU-bound workloads this patch would be better than doing
> the math in the VS.
>
> I was actually surprised I didn't find this patch to be a win -- it
> looks like it mostly to had to do with the double precision math being
> expensive. Given that we're operating on limited-size surfaces, I think
> we could justify storing floating point matrices here and doing the math
> in floats.
Hi Eric,
in order to get a better view of the situation, I added some simple
instrumentation to cairo-gl. The results for sufficiently long runs of
various benchmark are:
vertices/update
Vertices CPU xforms Uniform updates (xforms/update)
firefox-talos-gfx 365825262 121941756 2262973 160 (53)
gnome-system-monitor 126700506 42233504 47897 2645 (881)
evolution 7236340 2412114 86182 83 (28)
poppler 15121080 5040360 253440 60 (20)
Where:
Vertices: How many vertices where emitted
CPU xforms: the number of point transforms on the CPU with Eric's patch
(always ~3xVertices)
Uniform updates: the number of ModelViewProjection uniform updates
with my branch
vertices/update: Ratio of vertices per uniform update
xforms/update: Ratio of CPU transforms per uniform update
More information about the cairo
mailing list