[Glamor] glamor: Changes to 'master'
Michel Dänzer
michel at daenzer.net
Thu Jul 12 02:45:03 PDT 2012
On Don, 2012-07-12 at 15:51 +0800, Zhigang Gong wrote:
> On Wed, Jul 11, 2012 at 01:00:12PM +0200, Michel Dänzer wrote:
> > On Mit, 2012-07-11 at 00:59 -0700, Zhigang Gong wrote:
> > >
> > > commit 593bd206bf2794dc1450e8e7d20142b8d0ca074b
> > > Author: Zhigang Gong <zhigang.gong at linux.intel.com>
> > > Date: Tue Jun 26 15:39:24 2012 +0800
> > >
> > > glamor_render: Don't allocate buffer for vbo each time.
> > >
> > > We can reuse the last one if the last one is big enough
> > > to contain current vertext data. In the meantime, Use
> > > MapBufferRange instead of MapBuffer.
> > >
> > > Testing shows, this patch brings some benefit for
> > > aa10text/rgb10text. Not too much, but indeed faster.
> > >
> > > Signed-off-by: Zhigang Gong <zhigang.gong at linux.intel.com>
> >
> > This commit decreases glyph throughput by about a factor of 4 with the
> > r600g driver on an RS880 laptop.
> >
> > AFAICT the new code always maps the VBO from offset 0, which I'm not
> > sure can be expected to perform well. However, I've tried sliding the
> > mapping range over a larger VBO, and it doesn't seem to help. Maybe the
> > r600g driver doesn't handle this kind of usage very well yet.
>
> Without this commit, each time we start to flush the compositing, we
> need to call glBufferData(GL_ARRAY_BUFFER,..) to allocate resources
> on GPU each time. With this commit, we will allocate new buffer only
> if current required buffer if larger than the allocated buffer. If current
> required buffer can be fit into the allocated buffer, we only need to map
> the needed range to vma.
The problem is that this commit always reuses the same range of the VBO
that was previously used by glDrawElements. This forces the GL
implementation to either copy the vertex data before submitting it to
the GPU, or flush the previous rendering commands and wait for them to
finish. Ideally, it should be possible to write the vertex data to the
VBO once and have the GPU consume it without any intervening CPU copies
or waits for GPU idle.
From my experience with EXA drivers, a good way to achieve that is to
allocate a VBO large enough to keep the VBO allocation overhead low and
to keep appending vertices to it until they no longer fit, at which
point the next VBO needs to be allocated.
The patch below implements something like that. The GL API usage and VBO
size could probably use some more tuning, but this already performs
better for me than reverting commit
593bd206bf2794dc1450e8e7d20142b8d0ca074b or disabling VBO usage with
your patch. How does it work for you?
> > Still, how would you feel about reverting this change for now?
> You can see that this commit does save the time to create new vertex
> buffers on GPU side each time. And on Intel's platform, this commit
> can get about 3% gain on the x11perf -aa10text. So IMO, it's not a
> good idea to revert it.
As a side note, a 3% gain on one platform vs. a 75% loss on another
platform seems like a no-brainer to me. :)
> I found an alternative way for yor, I added a new patch to disable the vbo
> usage at all. And we allocated vb array in system memory directly. Please
> check the below patch out and test on your platform. You just need to
> change the use_vbo to 0 at glamor_priv.h. Does that work for you?
It does provide the same performance as reverting commit
593bd206bf2794dc1450e8e7d20142b8d0ca074b on this machine. However, again
I'm not sure this allows the GL implementation to avoid CPU copies of
the vertex data, which could be an issue with faster GPUs.
commit c47545d0217b947155d90481e00f9fbbe3072bd4
Author: Michel Dänzer <michel.daenzer at amd.com>
Date: Thu Jul 12 11:34:54 2012 +0200
Stream vertex data to VBOs.
diff --git a/src/glamor_render.c b/src/glamor_render.c
index d986e9a..966d7e4 100644
--- a/src/glamor_render.c
+++ b/src/glamor_render.c
@@ -725,16 +725,6 @@ glamor_setup_composite_vbo(ScreenPtr screen, int n_verts)
glamor_get_screen_private(screen);
glamor_gl_dispatch *dispatch;
int vert_size;
- Bool need_new_buffer = FALSE;
-
- glamor_priv->vbo_offset = 0;
- glamor_priv->render_nr_verts = 0;
- vert_size = n_verts * sizeof(float) * 2;
-
- if (glamor_priv->vbo_size < vert_size) {
- glamor_priv->vbo_size = vert_size;
- need_new_buffer = TRUE;
- }
glamor_priv->vb_stride = 2 * sizeof(float);
if (glamor_priv->has_source_coords)
@@ -742,17 +732,26 @@ glamor_setup_composite_vbo(ScreenPtr screen, int n_verts)
if (glamor_priv->has_mask_coords)
glamor_priv->vb_stride += 2 * sizeof(float);
+ vert_size = n_verts * glamor_priv->vb_stride;
+
dispatch = glamor_get_dispatch(glamor_priv);
dispatch->glBindBuffer(GL_ARRAY_BUFFER, glamor_priv->vbo);
if (glamor_priv->gl_flavor == GLAMOR_GL_DESKTOP) {
- if (need_new_buffer)
+ if (glamor_priv->vbo_size < (glamor_priv->vbo_offset + vert_size)) {
+ glamor_priv->vbo_size = GLAMOR_COMPOSITE_VBO_VERT_CNT *
+ glamor_priv->vb_stride;
+ glamor_priv->vbo_offset = 0;
dispatch->glBufferData(GL_ARRAY_BUFFER,
- vert_size,
- NULL, GL_DYNAMIC_DRAW);
- glamor_priv->vb = dispatch->glMapBufferRange(GL_ARRAY_BUFFER, 0,
- vert_size,
- GL_MAP_READ_BIT | GL_MAP_WRITE_BIT);
+ glamor_priv->vbo_size,
+ NULL, GL_STREAM_DRAW);
+ }
+
+ glamor_priv->vb = dispatch->glMapBufferRange(GL_ARRAY_BUFFER,
+ glamor_priv->vbo_offset,
+ vert_size,
+ GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT);
assert(glamor_priv->vb != NULL);
+ glamor_priv->vb -= glamor_priv->vbo_offset;
}
dispatch->glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, glamor_priv->ebo);
@@ -1428,6 +1427,7 @@ glamor_composite_with_shader(CARD8 op,
&key, shader,
&op_info);
}
+ glamor_priv->render_nr_verts = 0;
}
dispatch->glBindBuffer(GL_ARRAY_BUFFER, 0);
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Debian, X and DRI developer
More information about the Glamor
mailing list