[Intel-gfx] [PATCH 5/8] drm/i915: Double check hangcheck.seqno after reset

Mika Kuoppala mika.kuoppala at linux.intel.com
Mon Oct 3 13:14:39 UTC 2016


Chris Wilson <chris at chris-wilson.co.uk> writes:

> Check that there was not a late recovery between us declaring the GPU
> hung and processing the reset. If the GPU did recover by itself, let the
> request remain on the active list and see if it hangs again!
>

Did you see this in action? Makes sense to recheck
after reset. I don't remember how TDR will deal with multiple
reset on the same engine but we should start tracking the seqno
that cause it and make sure we don't get stuck by replaying the same.

Do we check the banning on resubmission and/or do we trust that
the breadcrumb update always succeedes?

I envision that if we get multiple resets on same seqno, we
just write the breadcrumbs through cpu and move on. But let's
hope we don't need to and the gpu breadcrumps are always enough.

Regardless, it's improvement and should weed out false positives
on some hangs.

Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>
-Mika

> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0cae8acdf906..a89a88922448 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2589,6 +2589,9 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine)
>  		return;
>  
>  	ring_hung = engine->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG;
> +	if (engine->hangcheck.seqno != intel_engine_get_seqno(engine))
> +		ring_hung = false;
> +
>  	i915_set_reset_status(request->ctx, ring_hung);
>  	if (!ring_hung)
>  		return;
> -- 
> 2.9.3


More information about the Intel-gfx mailing list