[Intel-gfx] [PATCH] drm/i915: Force synchronous TLB invalidations on the BLT ring for SNB+

Wed Nov 9 21:17:06 CET 2011

On Wed, 09 Nov 2011 19:15:06 +0000, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Wed, 09 Nov 2011 10:57:00 -0800, Eric Anholt <eric at anholt.net> wrote:
> > On Wed,  9 Nov 2011 17:32:26 +0000, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > We are advised that in order to workaround the TLB invalidation being
> > > performed asynchronously with the command streamer that we need to use
> > > a post-sync write along with the invalidations in the MI_FLUSH_DW so
> > > that all operations are complete before executing the next command in
> > > the ringbuffer.
> > > 
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=37990
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Jesse Barnes <jbarnes at virtuousgeek.org>
> > 
> > I find the "References:" to a bug that this is not confirmed to fix to
> > be very misleading.
> 
> Because it references a bug I think is due to TLB invalidations as the
> glitches look very similar to earlier TLB invalidation.

That sort of context included with the bz reference would make me feel
more comfortable.

> > > ---
> > >  drivers/gpu/drm/i915/i915_reg.h         |   11 +++++++++--
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |   13 +++++++++++--
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
> > >  3 files changed, 21 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > > index cbf5f9f..bfda4b2 100644
> > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > @@ -183,8 +183,15 @@
> > >   */
> > >  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*x-1)
> > >  #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
> > > -#define   MI_INVALIDATE_TLB	(1<<18)
> > > -#define   MI_INVALIDATE_BSD	(1<<7)
> > > +#define   FLUSH_DW_USE_INDEX		(1<<21)
> > > +#define   MI_INVALIDATE_TLB		(1<<18)
> > > +#define   FLUSH_DW_SYNC_GFDT		(1<<17)
> > > +#define   FLUSH_DW_NO_WRITE		(0<<14)
> > > +#define   FLUSH_DW_WRITE_QWORD		(1<<14)
> > > +#define   FLUSH_DW_WRITE_TIMESTAMP	(3<<14)
> > > +#define   MI_INVALIDATE_BSD		(1<<7)
> > > +#define   FLUSH_DW_PPGTT		(0<<2)
> > > +#define   FLUSH_DW_GTT			(1<<2)
> > >  #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
> > >  #define   MI_BATCH_NON_SECURE	(1)
> > >  #define   MI_BATCH_NON_SECURE_I965 (1<<8)
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 3c30dba..a77858c 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1431,10 +1431,19 @@ static int blt_ring_flush(struct intel_ring_buffer *ring,
> > >  		return ret;
> > >  
> > >  	cmd = MI_FLUSH_DW;
> > > -	if (invalidate & I915_GEM_DOMAIN_RENDER)
> > > +	if (invalidate & I915_GEM_DOMAIN_RENDER) {
> > >  		cmd |= MI_INVALIDATE_TLB;
> > > +		/* In order to actually force the invalidations to take place
> > > +		 * before proceeding, we need to employ a workaround of
> > > +		 * enabling a post-sync write.
> > > +		 */
> > > +		cmd |= FLUSH_DW_USE_INDEX;
> > > +		cmd |= FLUSH_DW_WRITE_TIMESTAMP;
> > > +	}
> > 
> > The commit message says this is about TLB flushes, while the comment
> > here talks about invalidations in general.  Which is true?
> 
> No the commit log doesn't say that.

I didn't think from "We are advised that in order to workaround the TLB
invalidation being performed asynchronously", that the workaround is
about things other than TLB invalidations too.  (Also, "advised"?
Where, in the PRM, B-Spec, or some other document?)  The plural on
"along with the invalidations" just made me think you were saying "it
doesn't need to be done in a separate flush_dw before/after the one we
wanted", since most other flush workarounds tend to need to be in
separate packets.

> > >  	intel_ring_emit(ring, cmd);
> > > -	intel_ring_emit(ring, 0);
> > > +	intel_ring_emit(ring,
> > > +			I915_GEM_TIMESTAMP_INDEX << MI_STORE_DWORD_INDEX_SHIFT |
> > > +			FLUSH_DW_GTT);
> > >  	intel_ring_emit(ring, 0);
> > 
> > My docs have "This field specifies Bits 31:3 of the Address where the
> > DWord or QWord will be stored.  Note that the address can only be QWord
> > aligned, irrespective of data size." for MI_FLUSH_DW, but
> > MI_STORE_DWORD_INDEX_SHIFT is 2 (which is correct for
> > MI_STORE_DATA_IMM).
> 
> The shift is to convert the index into the HWS from being a dword index
> into an offset, which itself is qword aligned.

OK, so we use this shift, but we just have to be sure to use
even-numbered indices.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20111109/d121ebf8/attachment.sig>