[Intel-gfx] [PATCH v4] drm/i915: Adjust size of PIPE_CONTROL used for gen8 render seqno write

Tue Apr 12 16:18:46 UTC 2016

On 12/04/16 14:51, Michał Winiarski wrote:
> We started to use PIPE_CONTROL to write render ring seqno in order to
> combat seqno write vs interrupt generation problems. This was introduced
> by commit 7c17d377374d ("drm/i915: Use ordered seqno write interrupt
> generation on gen8+ execlists").
>
> On gen8+ size of PIPE_CONTROL with Post Sync Operation should be
> 6 dwords. When we're using older 5-dword variant it's possible to
> observe inconsistent values written by PIPE_CONTROL with Post
> Sync Operation from user batches, resulting in rendering corruptions.
>
> v2: Fix BAT failures
> v3: Comments on alignment and thrashing high dword of seqno (Chris)
> v4: Updated commit msg (Mika)
>
> Testcase: igt/gem_pipe_control_store_loop/*-qword-write
> Issue: VIZ-7393
> Cc: stable at vger.kernel.org
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Abdiel Janulgue <abdiel.janulgue at linux.intel.com>
> Signed-off-by: Michał Winiarski <michal.winiarski at intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0d6dc5e..30abe53 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1945,15 +1945,18 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>   	struct intel_ringbuffer *ringbuf = request->ringbuf;
>   	int ret;
>
> -	ret = intel_logical_ring_begin(request, 6 + WA_TAIL_DWORDS);
> +	ret = intel_logical_ring_begin(request, 8 + WA_TAIL_DWORDS);
>   	if (ret)
>   		return ret;
>
> +	/* We're using qword write, seqno should be aligned to 8 bytes. */
> +	BUILD_BUG_ON(I915_GEM_HWS_INDEX & 1);
> +
>   	/* w/a for post sync ops following a GPGPU operation we
>   	 * need a prior CS_STALL, which is emitted by the flush
>   	 * following the batch.
>   	 */
> -	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(5));
> +	intel_logical_ring_emit(ringbuf, GFX_OP_PIPE_CONTROL(6));
>   	intel_logical_ring_emit(ringbuf,
>   				(PIPE_CONTROL_GLOBAL_GTT_IVB |
>   				 PIPE_CONTROL_CS_STALL |
> @@ -1961,7 +1964,10 @@ static int gen8_emit_request_render(struct drm_i915_gem_request *request)
>   	intel_logical_ring_emit(ringbuf, hws_seqno_address(request->engine));
>   	intel_logical_ring_emit(ringbuf, 0);
>   	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
> +	/* We're thrashing one dword of HWS. */
> +	intel_logical_ring_emit(ringbuf, 0);
>   	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
>   	return intel_logical_ring_advance_and_submit(request);
>   }

In the scheduler+preemption patches, we actually make use of the fact 
that we're writing a QWord, so that we can set the completed-seqno and 
clear the in-progress seqno in one operation (it doesn't actually matter 
if the h/w turns it into two DWord writes, though).

.Dave.