[Intel-gfx] [PATCH v2] drm/i915/gt: Convert reset prepare failure log to trace

John Harrison john.c.harrison at intel.com
Tue Dec 5 09:10:57 UTC 2023


On 12/5/2023 00:52, Nirmoy Das wrote:
> gen8_engine_reset_prepare() can fail when HW fails to set
> RESET_CTL_READY_TO_RESET bit. In some cases this is not fatal
> error as driver will retry.
>
> Convert the log to a trace log for debugging without triggering
> unnecessary concerns in CI or for end-users during non-fatal scenarios.
I strongly disagree with this change. The hardware spec for the 
RESET_CTL and GDRST registers are that they will self clear within a 
matter of microseconds. If something is so badly wrong with the hardware 
that it can't even manage to reset then that is something that very much 
warrants more than a completely silent trace event. It most certainly 
should be flagged as a failure in CI.

Just because the driver will retry does not mean that this is not a 
serious error. And if the first attempt failed, why would a subsequent 
attempt succeed? Escalating to FLR may have more success, but that is 
not something that i915 currently does.

John.


>
> v2: Improve commit message(Tvrtko)
>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> Cc: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Andrzej Hajda <andrzej.hajda at intel.com>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5591
> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
> Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index d5ed904f355d..e6fbc6202c80 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -593,10 +593,10 @@ static int gen8_engine_reset_prepare(struct intel_engine_cs *engine)
>   	ret = __intel_wait_for_register_fw(uncore, reg, mask, ack,
>   					   700, 0, NULL);
>   	if (ret)
> -		gt_err(engine->gt,
> -		       "%s reset request timed out: {request: %08x, RESET_CTL: %08x}\n",
> -		       engine->name, request,
> -		       intel_uncore_read_fw(uncore, reg));
> +		GT_TRACE(engine->gt,
> +			 "%s reset request timed out: {request: %08x, RESET_CTL: %08x}\n",
> +			 engine->name, request,
> +			 intel_uncore_read_fw(uncore, reg));
>   
>   	return ret;
>   }



More information about the Intel-gfx mailing list