[Intel-gfx] [PATCH] drm/i915: Force synchronous TLB invalidations on the BLT ring for SNB+

Wed Nov 9 19:57:00 CET 2011

On Wed,  9 Nov 2011 17:32:26 +0000, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> We are advised that in order to workaround the TLB invalidation being
> performed asynchronously with the command streamer that we need to use
> a post-sync write along with the invalidations in the MI_FLUSH_DW so
> that all operations are complete before executing the next command in
> the ringbuffer.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=37990
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Jesse Barnes <jbarnes at virtuousgeek.org>

I find the "References:" to a bug that this is not confirmed to fix to
be very misleading.

> ---
>  drivers/gpu/drm/i915/i915_reg.h         |   11 +++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   13 +++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  3 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index cbf5f9f..bfda4b2 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -183,8 +183,15 @@
>   */
>  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*x-1)
>  #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
> -#define   MI_INVALIDATE_TLB	(1<<18)
> -#define   MI_INVALIDATE_BSD	(1<<7)
> +#define   FLUSH_DW_USE_INDEX		(1<<21)
> +#define   MI_INVALIDATE_TLB		(1<<18)
> +#define   FLUSH_DW_SYNC_GFDT		(1<<17)
> +#define   FLUSH_DW_NO_WRITE		(0<<14)
> +#define   FLUSH_DW_WRITE_QWORD		(1<<14)
> +#define   FLUSH_DW_WRITE_TIMESTAMP	(3<<14)
> +#define   MI_INVALIDATE_BSD		(1<<7)
> +#define   FLUSH_DW_PPGTT		(0<<2)
> +#define   FLUSH_DW_GTT			(1<<2)
>  #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
>  #define   MI_BATCH_NON_SECURE	(1)
>  #define   MI_BATCH_NON_SECURE_I965 (1<<8)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 3c30dba..a77858c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1431,10 +1431,19 @@ static int blt_ring_flush(struct intel_ring_buffer *ring,
>  		return ret;
>  
>  	cmd = MI_FLUSH_DW;
> -	if (invalidate & I915_GEM_DOMAIN_RENDER)
> +	if (invalidate & I915_GEM_DOMAIN_RENDER) {
>  		cmd |= MI_INVALIDATE_TLB;
> +		/* In order to actually force the invalidations to take place
> +		 * before proceeding, we need to employ a workaround of
> +		 * enabling a post-sync write.
> +		 */
> +		cmd |= FLUSH_DW_USE_INDEX;
> +		cmd |= FLUSH_DW_WRITE_TIMESTAMP;
> +	}

The commit message says this is about TLB flushes, while the comment
here talks about invalidations in general.  Which is true?

>  	intel_ring_emit(ring, cmd);
> -	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring,
> +			I915_GEM_TIMESTAMP_INDEX << MI_STORE_DWORD_INDEX_SHIFT |
> +			FLUSH_DW_GTT);
>  	intel_ring_emit(ring, 0);

My docs have "This field specifies Bits 31:3 of the Address where the
DWord or QWord will be stored.  Note that the address can only be QWord
aligned, irrespective of data size." for MI_FLUSH_DW, but
MI_STORE_DWORD_INDEX_SHIFT is 2 (which is correct for
MI_STORE_DATA_IMM).

>  	intel_ring_emit(ring, MI_NOOP);
>  	intel_ring_advance(ring);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c8b9cc0..b09605c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -164,6 +164,7 @@ intel_read_status_page(struct intel_ring_buffer *ring,
>  #define READ_BREADCRUMB(dev_priv) READ_HWSP(dev_priv, I915_BREADCRUMB_INDEX)
>  #define I915_GEM_HWS_INDEX		0x20
>  #define I915_BREADCRUMB_INDEX		0x21
> +#define I915_TIMESTAMP_INDEX		0x22

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20111109/0c08ed57/attachment.sig>