[Mesa-dev] [PATCH 2/2] i965: Fix brw_finish_batch to grow the batchbuffer.

Mon Sep 18 20:44:57 UTC 2017

On Monday, September 18, 2017 11:14:35 AM PDT Chris Wilson wrote:
> Quoting Kenneth Graunke (2017-09-18 18:56:57)
> > brw_finish_batch emits commands needed at the end of every batch buffer,
> > including any workarounds.  In the past, we freed up some "reserved"
> > batch space before calling it, so we would never have to flush during
> > it.  This was error prone and easy to screw up, so I deleted it a while
> > back in favor of growing the batch.
> > 
> > There were two problems:
> > 
> > 1. We're in the middle of flushing, so brw->no_batch_wrap is guaranteed
> >    not to be set.  Using BEGIN_BATCH() to emit commands would cause a
> >    recursive flush rather than growing the buffer as intended.
> 
> My bad. I've tripped over that same issue before.
> 
> > 2. We already recorded the throttling batch before growing, which
> >    replaces brw->batch.bo with a different (larger) buffer.  So growing
> >    would break throttling.
> 
> Ooh, in most cases that means we stored a later active handle.

Not exactly...we would set brw->throttle_batch[0] = brw->batch.bo, then
allocate a new brw->batch.bo, and throw away the old one (putting it back
in the BO cache).  So, brw->throttle_batch[0] might not ever be execbuf'd
at all...or it could get reused for a future batchbuffer.

Seems like nothing good could come of this.

> > These are easily remedied by shuffling some code around and whacking
> > brw->no_batch_wrap in brw_finish_batch().  This also now includes the
> > final workarounds in the batch usage statistics.  Found by inspection.
> 
> Ok, if it was ever hit it would cause infinite recursion in
> intel_batchbuffer_flush_fence, so pretty recognisable and doesn't need
> any assert to convert into something fatal.

Yeah, I don't think I've ever actually seen it happen.  If you recurse
there, we should crash, so it would be pretty obvious.

> > Fixes: 2c46a67b4138631217141f (i965: Delete BATCH_RESERVED handling.)
> > ---
> >  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > index dd584f533b9..8d7a738dc28 100644
> > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > @@ -631,6 +631,8 @@ brw_finish_batch(struct brw_context *brw)
> >  {
> >     const struct gen_device_info *devinfo = &brw->screen->devinfo;
> >  
> > +   brw->no_batch_wrap = true;
> > +
> >     /* Capture the closing pipeline statistics register values necessary to
> >      * support query objects (in the non-hardware context world).
> >      */
> > @@ -672,6 +674,8 @@ brw_finish_batch(struct brw_context *brw)
> >        /* Round batchbuffer usage to 2 DWORDs. */
> >        intel_batchbuffer_emit_dword(&brw->batch, MI_NOOP);
> 
> You really don't have to emit a MI_NOOP after the bbe. You aren't
> emitting into the ring and so require a qword tail update, and batches
> are allocated in pages, so no worries about that qword read going beyond
> the end.

Good point.  I can write a patch to drop that.

> >     }
> > +
> > +   brw->no_batch_wrap = false;
> >  }
> >  
> >  static void
> > @@ -885,6 +889,12 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
> >     if (USED_BATCH(brw->batch) == 0)
> >        return 0;
> >  
> > +   /* Check that we didn't just wrap our batchbuffer at a bad time. */
> > +   assert(!brw->no_batch_wrap);
> > +
> > +   brw_finish_batch(brw);
> > +   intel_upload_finish(brw);
> > +
> >     if (brw->throttle_batch[0] == NULL) {
> >        brw->throttle_batch[0] = brw->batch.bo;
> >        brw_bo_reference(brw->throttle_batch[0]);
> > @@ -904,13 +914,6 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
> >                brw->batch.state_relocs.reloc_count);
> >     }
> >  
> > -   brw_finish_batch(brw);
> > -
> > -   intel_upload_finish(brw);
> > -
> > -   /* Check that we didn't just wrap our batchbuffer at a bad time. */
> > -   assert(!brw->no_batch_wrap);
> 
> Keep a sanity check !brw->no_batch_wrap after flushing?

I moved the assert(!brw->no_batch_wrap) before brw_finish_batch, so we'd
still guard against flushing-while-flushing.  We have to do that because
brw_finish_batch() now whacks it.  I could leave one here too, if you
like, but that would only guard against brw_finish_batch() bugs, rather
than bugs in callers...so it didn't seem as useful...

> > -
> >     ret = do_flush_locked(brw, in_fence_fd, out_fence_fd);
> >  
> >     if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) {
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170918/c27ed10b/attachment-0001.sig>