[Intel-gfx] [PATCH v2 3/5] drm/i915: Hold forcewake for the duration of reset+restart

Mon Oct 9 11:32:16 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Resetting the engine requires us to hold the forcewake wakeref to
> prevent RC6 trying to happen in the middle of the reset sequence. The
> consequence of an unwanted RC6 event in the middle is that random state
> is then saved to the powercontext and restored later, which may
> overwrite the mmio state we need to preserve (e.g. PD_DIR_BASE in the
> legacy ringbuffer reset_ring_common()).
>
> This was noticed in the live_hangcheck selftests when Haswell would
> sporadically fail to restart during igt_reset_queue().
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 82a10036fb38..eba23c239aae 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2832,7 +2832,17 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_gem_request *request = NULL;
>  
> -	/* Prevent the signaler thread from updating the request
> +	/*
> +	 * During the reset sequence, we must prevent the engine from
> +	 * entering RC6. As the context state is undefined until we restart
> +	 * the engine, if it does enter RC6 during the reset, the state
> +	 * written to the powercontext is undefined and so we may lose
> +	 * GPU state upon resume, i.e. fail to restart after a reset.
> +	 */
> +	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);

We do nested get when actually issuing the hw commands. I would
still keep them there and consider changing them to asserts
some day.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

> +
> +	/*
> +	 * Prevent the signaler thread from updating the request
>  	 * state (by calling dma_fence_signal) as we are processing
>  	 * the reset. The write from the GPU of the seqno is
>  	 * asynchronous and the signaler thread may see a different
> @@ -2843,7 +2853,8 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
>  	 */
>  	kthread_park(engine->breadcrumbs.signaler);
>  
> -	/* Prevent request submission to the hardware until we have
> +	/*
> +	 * Prevent request submission to the hardware until we have
>  	 * completed the reset in i915_gem_reset_finish(). If a request
>  	 * is completed by one engine, it may then queue a request
>  	 * to a second via its engine->irq_tasklet *just* as we are
> @@ -3033,6 +3044,8 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
>  {
>  	tasklet_enable(&engine->execlists.irq_tasklet);
>  	kthread_unpark(engine->breadcrumbs.signaler);
> +
> +	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
>  }
>  
>  void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
> -- 
> 2.14.2