[Intel-gfx] [PATCH] drm/i915: Show if we consider the engine is idle in the GPU error state
Rodrigo Vivi
rodrigo.vivi at intel.com
Tue Dec 19 20:49:54 UTC 2017
On Tue, Dec 19, 2017 at 01:14:19PM +0000, Chris Wilson wrote:
> Useful for verifying our bookkeeper when we encounter is knowing whether
> we think the engine is idle at the time of the GPU hang.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104305
Here you mention the hang as "false positive"...
if it is a false positive and we have this idle information
shouldn't we handle this differently instead of trowing the error
information and reseting the GPU?
Or am I missunderstanding what you meant with "false positive"?
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
Anyways the info here seems interresting so
Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++
> 2 files changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1aba5657f5f0..8ca836851365 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -948,6 +948,7 @@ struct i915_gpu_state {
> struct drm_i915_error_engine {
> int engine_id;
> /* Software tracked state */
> + bool idle;
> bool waiting;
> int num_waiters;
> unsigned long hangcheck_timestamp;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index aba50aa613f1..50feec87c3a3 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -416,6 +416,7 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
> int n;
>
> err_printf(m, "%s command stream:\n", engine_str(ee->engine_id));
> + err_printf(m, " IDLE?: %s\n", yesno(ee->idle));
> err_printf(m, " START: 0x%08x\n", ee->start);
> err_printf(m, " HEAD: 0x%08x [0x%08x]\n", ee->head, ee->rq_head);
> err_printf(m, " TAIL: 0x%08x [0x%08x, 0x%08x]\n",
> @@ -1256,6 +1257,7 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
> ee->hws = I915_READ(mmio);
> }
>
> + ee->idle = intel_engine_is_idle(engine);
> ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
> ee->hangcheck_action = engine->hangcheck.action;
> ee->hangcheck_stalled = engine->hangcheck.stalled;
> --
> 2.15.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list