[Intel-gfx] [PATCH] drm/i915: Show if we consider the engine is idle in the GPU error state

Rodrigo Vivi rodrigo.vivi at intel.com
Tue Dec 19 20:49:54 UTC 2017


On Tue, Dec 19, 2017 at 01:14:19PM +0000, Chris Wilson wrote:
> Useful for verifying our bookkeeper when we encounter is knowing whether
> we think the engine is idle at the time of the GPU hang.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104305

Here you mention the hang as "false positive"...
if it is a false positive and we have this idle information
shouldn't we handle this differently instead of trowing the error
information and reseting the GPU?

Or am I missunderstanding what you meant with "false positive"?

> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>

Anyways the info here seems interresting so

Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h       | 1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1aba5657f5f0..8ca836851365 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -948,6 +948,7 @@ struct i915_gpu_state {
>  	struct drm_i915_error_engine {
>  		int engine_id;
>  		/* Software tracked state */
> +		bool idle;
>  		bool waiting;
>  		int num_waiters;
>  		unsigned long hangcheck_timestamp;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index aba50aa613f1..50feec87c3a3 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -416,6 +416,7 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>  	int n;
>  
>  	err_printf(m, "%s command stream:\n", engine_str(ee->engine_id));
> +	err_printf(m, "  IDLE?: %s\n", yesno(ee->idle));
>  	err_printf(m, "  START: 0x%08x\n", ee->start);
>  	err_printf(m, "  HEAD:  0x%08x [0x%08x]\n", ee->head, ee->rq_head);
>  	err_printf(m, "  TAIL:  0x%08x [0x%08x, 0x%08x]\n",
> @@ -1256,6 +1257,7 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
>  		ee->hws = I915_READ(mmio);
>  	}
>  
> +	ee->idle = intel_engine_is_idle(engine);
>  	ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
>  	ee->hangcheck_action = engine->hangcheck.action;
>  	ee->hangcheck_stalled = engine->hangcheck.stalled;
> -- 
> 2.15.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list