[Mesa-dev] [PATCH 10/18] i965: Speculatively flush the batch after transform feedback

Tue Jul 7 02:12:20 PDT 2015

On Mon, Jul 06, 2015 at 09:05:18PM -0700, Kristian Høgsberg wrote:
> On Mon, Jul 6, 2015 at 12:36 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> > On Monday, July 06, 2015 11:33:15 AM Chris Wilson wrote:
> >> Since the purpose of transform feedback tends to be for the client to
> >> act upon the results to change the geometry in the scene, it is likely
> >> that the client will soon be waiting upon the results. Flush the batch
> >> early so that we don't build up a long queue of commands afterwards that
> >> could delay the readback.
> >> ---
> >>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> >> index 857ebe5..13dbe5b 100644
> >> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> >> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> >> @@ -494,6 +494,12 @@ gen7_end_transform_feedback(struct gl_context *ctx,
> >>
> >>     brw_batch_end(&brw->batch);
> >>
> >> +   /* We will likely want to read the results in the very near future, so
> >> +    * push this primitive to hardware if it is currently idle.
> >> +    */
> >> +   if (!brw_batch_busy(&brw->batch))
> >> +      brw_batch_flush(&brw->batch);
> >> +
> >>     /* EndTransformFeedback() means that we need to update the number of
> >>      * vertices written.  Since it's only necessary if DrawTransformFeedback()
> >>      * is called and it means mapping a buffer object, we delay computing it
> >>
> >
> > We need some data to justify this change.
> 
> I think even the theory is not correct - transform feedback is
> typically fed back into the GPU (as new geometry, eg) rather than
> consumed by the CPU, and in that case the flush is not helpful. But at
> the end of the day, data will tell.

How are they fed back? Can the xfb buffer be bound to the vertex buffer?
(Genuine question! The only examples I've seen were for testing by the
CPU.)

The point of the patch was really more about getting people to think
about the idea of making sure we queue work early that we need in the
near future, and breaking such work up into packets that are naturally
fenced by the kernel.

However, Jesse made a good point that spinning on a manual semaphore for
such feedback (if needed by the CPU) is likely far superior than using
the kernel wait interfaces.

For the query object, we would reserve the first slot for the semaphore
tracking, then after every query pair would add a PIPE_CONTROL dword
write to that slot with the new seqno. For reporting we need only map
async and spin until that value is greater than the query we want to
report back to the user.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre