[Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

Tue Jul 7 16:14:05 PDT 2015

On Tuesday, July 07, 2015 09:02:16 PM Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 10:31:07AM -0700, Kenneth Graunke wrote:
> > On Tuesday, July 07, 2015 04:46:22 PM Chris Wilson wrote:
> > > On Tue, Jul 07, 2015 at 10:12:20AM +0100, Chris Wilson wrote:
> > > > On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote:
> > > > > On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> > > > > > On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
> > > > > >> Since the purpose of transform feedback tends to be for the client to
> > > > > >> act upon the results to change the geometry in the scene, it is likely
> > > > > >> that the client will soon be waiting upon the results. Flush the batch
> > > > > >> early so that we don't build up a long queue of commands afterwards that
> > > > > >> could delay the readback.
> > > > > >> ---
> > > > > >>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++++++
> > > > > >>  1 file changed, 6 insertions(+)
> > > > > >>
> > > > > >> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> index 857ebe5..13dbe5b 100644
> > > > > >> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > > > > >> @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,
> > > > > >>
> > > > > >>     brw_batch_end(&brw->batch);
> > > > > >>
> > > > > >> +   /* We will likely want to read the results in the very near future, so
> > > > > >> +    * push this primitive to hardware if it is currently idle.
> > > > > >> +    */
> > > > > >> +   if (!brw_batch_busy(&brw->batch))
> > > > > >> +      brw_batch_flush(&brw->batch);
> > > > > >> +
> > > > > >>     /* EndTransformFeedback() means that we need to update the number of
> > > > > >>      * vertices written.  Since it's only necessary if DrawTransformFeedback()
> > > > > >>      * is called and it means mapping a buffer object, we delay computing it
> > > > > >>
> > > > > >
> > > > > > We need some data to justify this change.
> > > > > 
> > > > > I think even the theory is not correct - transform feedback is
> > > > > typically fed back into the GPU (as new geometry, eg) rather than
> > > > > consumed by the CPU, and in that case the flush is not helpful. But at
> > > > > the end of the day, data will tell.
> > > > 
> > > > How are they fed back? Can the xfb buffer be bound to the vertex buffer?
> > > > (Genuine question! The only examples I've seen were for testing by the
> > > > CPU.)
> > 
> > Yes, it can.  Just glBindBuffer() some buffers around.  Or, I suspect
> > one could bind it as a texture buffer object or SSBO and then use a
> > compute shader on the results.
> > 
> > With GL 4.x, the "avoid synchronizing with the CPU" mentality is a lot
> > more prevalent, due to the advent of compute shaders.
> > 
> > > 
> > > I've reviewed the code again, and gen7_end_transform_feedback() is always
> > > followed by brw_compute_xfb_vertices_written (and a read of the sol
> > > buffer) afaict, maybe not immediately but always before the next
> > > transform feedback.
> > 
> > Sadly, yes.  We have a primitive count and we need a vertex count - so,
> > a tiny bit of math.  Ideally, we would use the Gen7.5 MI_MATH+ feature
> > to do this, eliminating the CPU-GPU synchronization point.
> > 
> > > Also afaict it is not possible to map the sol buffer directly into the
> > > application.
> > > -Chris
> > 
> > It definitely is - the application creates GL buffer objects and binds
> > them for use with transform feedback.  They can certainly
> > glMapBufferRange() those buffers.
> 
> The trouble I see is that the values stored currently are implementation
> dependent and often reset. How is the application meant to use them
> directly?
> 
> (Just trying to understand a bit better. If it is that the current
> implementation is stalling when not required, then trying to speed
> those stalls up really is just lipstick on a pig and irrelevant. The
> patch was just trying to make a suggestion that feeding the gpu around
> expected stall points works best with the current batch-level granularity
> of our fences. Using intrabatch semaphores for the query objects seems a
> more promising avenue than doing batch flushes anyway.)
> -Chris

I think we misunderstood each other.  By "SOL buffer" do you mean
prim_count_bo?  If so, that's not visible to applications.

Stream out (aka transform feedback) works by writing geometry data
coming out of the VS/HS/DS/GS stages (whichever is last) into an
application buffer.  So I assumed you meant that buffer.  But the
format of /that/ data is absolutely controlled by the application.

The mechanism for counting the primitives written (to implement
glDrawTransformFeedback()) is entirely up to the driver.  It's not
the best.  Prior to MI_MATH existing, it was the best I could think of.

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150707/835fa64f/attachment.sig>