[Intel-gfx] [PATCH 2/3] drm/i915/gt: Don't declare hangs if engine is stalled

Mika Kuoppala mika.kuoppala at linux.intel.com
Thu May 28 16:23:18 UTC 2020


Chris Wilson <chris at chris-wilson.co.uk> writes:

> If the ring submission is stalled on an external request, nothing can be
> submitted, not even the heartbeat in the kernel context. Since nothing
> is running, resetting the engine/device does not unblock the system and
> is pointless. We can see if the heartbeat is supposed to be running
> before declaring foul.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index 5136c8bf112d..f67ad937eefb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -48,8 +48,10 @@ static void show_heartbeat(const struct i915_request *rq,
>  	struct drm_printer p = drm_debug_printer("heartbeat");
>  
>  	intel_engine_dump(engine, &p,
> -			  "%s heartbeat {prio:%d} not ticking\n",
> +			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
>  			  engine->name,
> +			  rq->fence.context,
> +			  rq->fence.seqno,
>  			  rq->sched.attr.priority);
>  }
>  
> @@ -76,8 +78,19 @@ static void heartbeat(struct work_struct *wrk)
>  		goto out;
>  
>  	if (engine->heartbeat.systole) {
> -		if (engine->schedule &&
> -		    rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
> +		if (!i915_sw_fence_signaled(&rq->submit)) {
> +			/*
> +			 * Not yet submitted, system is stalled.
> +			 *
> +			 * This more often happens for ring submission,
> +			 * where all contexts are funnelled into a common
> +			 * ringbuffer. If one context is blocked on an
> +			 * external fence, not only is it not submitted,
> +			 * but all other contexts, including the kernel
> +			 * context are stuck waiting for the signal.
> +			 */

The solution how to save the system evades me.
But piling the heartbeat on top does not help with it in
any case.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

> +		} else if (engine->schedule &&
> +			   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
>  			/*
>  			 * Gradually raise the priority of the heartbeat to
>  			 * give high priority work [which presumably desires
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list