[Intel-gfx] [PATCH] drm/i915/lrc: Scrub the GPU state of the guilty hanging request

Michel Thierry michel.thierry at intel.com
Fri Apr 27 20:12:38 UTC 2018


On 4/27/2018 12:32 PM, Chris Wilson wrote:
> Previously, we just reset the ring register in the context image such
> that we could skip over the broken batch and emit the closing
> breadcrumb. However, on resume the context image and GPU state would be
> reloaded, which may have been left in an inconsistent state by the
> reset. The presumption was that at worst it would just cause another
> reset and skip again until it recovered, however it seems just as likely
> to cause an unrecoverable hang. Instead of risking loading an incomplete
> context image, restore it back to the default state.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: MichaƂ Winiarski <michal.winiarski at intel.com>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++-------
>   1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ce23d5116482..422b05290ed6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1804,8 +1804,8 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>   			      struct i915_request *request)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct intel_context *ce;
>   	unsigned long flags;
> +	u32 *regs;
>   
>   	GEM_TRACE("%s request global=%x, current=%d\n",
>   		  engine->name, request ? request->global_seqno : 0,
> @@ -1855,14 +1855,24 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>   	 * future request will be after userspace has had the opportunity
>   	 * to recreate its own state.
>   	 */
> -	ce = &request->ctx->engine[engine->id];
> -	execlists_init_reg_state(ce->lrc_reg_state,
> -				 request->ctx, engine, ce->ring);
> +	regs = request->ctx->engine[engine->id].lrc_reg_state;
> +	if (engine->default_state) {
> +		void *defaults;
> +
> +		defaults = i915_gem_object_pin_map(engine->default_state,
> +						   I915_MAP_WB);
> +		if (!IS_ERR(defaults)) {
> +			memcpy(regs,
> +			       defaults + LRC_HEADER_PAGES * PAGE_SIZE,
> +			       engine->context_size);
Hi,

The context_size is taking into count the PP_HWSP page, do we also need 
to rewrite the PP_HSWP? (or just the logical state).

Also regs is already pointing to the start of the logical state
(vaddr + LRC_STATE_PN * PAGE_SIZE).

So if we want to overwrite from the PP_HWSP, then regs is not the right 
offset, or if we only want to change the logical state then it should be 
from 'defaults +  LRC_STATE_PN * PAGE_SIZE'.

-Michel

> +			i915_gem_object_unpin_map(engine->default_state);
> +		}
> +	}
> +	execlists_init_reg_state(regs, request->ctx, engine, request->ring);
>   
>   	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
> -	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
> -		i915_ggtt_offset(ce->ring->vma);
> -	ce->lrc_reg_state[CTX_RING_HEAD+1] = request->postfix;
> +	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
> +	regs[CTX_RING_HEAD + 1] = request->postfix;
>   
>   	request->ring->head = request->postfix;
>   	intel_ring_update_space(request->ring);
> 


More information about the Intel-gfx mailing list