[Intel-gfx] [PATCH v2] drm/i915/gt: Convert reset prepare failure log to trace

Nirmoy Das nirmoy.das at linux.intel.com
Tue Dec 5 10:39:39 UTC 2023


Hi John,

On 12/5/2023 10:10 AM, John Harrison wrote:
> On 12/5/2023 00:52, Nirmoy Das wrote:
>> gen8_engine_reset_prepare() can fail when HW fails to set
>> RESET_CTL_READY_TO_RESET bit. In some cases this is not fatal
>> error as driver will retry.
>>
>> Convert the log to a trace log for debugging without triggering
>> unnecessary concerns in CI or for end-users during non-fatal scenarios.
> I strongly disagree with this change. The hardware spec for the 
> RESET_CTL and GDRST registers are that they will self clear within a 
> matter of microseconds. If something is so badly wrong with the 
> hardware that it can't even manage to reset


This message is for reset readiness  poll timeout not that the reset is 
failed which doesn't sound so serious if the subsequent attempt managed 
reset the engine.

I couldn't get enough details when this can happen that HW takes very 
long time to set the readiness bit.


> then that is something that very much warrants more than a completely 
> silent trace event. It most certainly should be flagged as a failure 
> in CI.
>
> Just because the driver will retry does not mean that this is not a 
> serious error. And if the first attempt failed, why would a subsequent 
> attempt succeed?

The patch is not ignoring the failure. If the subsequent attempt fails 
then driver load will fail or it will be wedged if that happens after 
driver load.


> Escalating to FLR may have more success, but that is not something 
> that i915 currently does.

Do we still need to do FLR if a subsequent engine reset failure ?


Regards,

Nirmoy

>
> John.
>
>
>>
>> v2: Improve commit message(Tvrtko)
>>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> Cc: Andi Shyti <andi.shyti at linux.intel.com>
>> Cc: Andrzej Hajda <andrzej.hajda at intel.com>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5591
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
>> Reviewed-by: Andrzej Hajda <andrzej.hajda at intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index d5ed904f355d..e6fbc6202c80 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -593,10 +593,10 @@ static int gen8_engine_reset_prepare(struct 
>> intel_engine_cs *engine)
>>       ret = __intel_wait_for_register_fw(uncore, reg, mask, ack,
>>                          700, 0, NULL);
>>       if (ret)
>> -        gt_err(engine->gt,
>> -               "%s reset request timed out: {request: %08x, 
>> RESET_CTL: %08x}\n",
>> -               engine->name, request,
>> -               intel_uncore_read_fw(uncore, reg));
>> +        GT_TRACE(engine->gt,
>> +             "%s reset request timed out: {request: %08x, RESET_CTL: 
>> %08x}\n",
>> +             engine->name, request,
>> +             intel_uncore_read_fw(uncore, reg));
>>         return ret;
>>   }
>


More information about the Intel-gfx mailing list