[Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

Thomas Hellström thomas.hellstrom at linux.intel.com
Sat Oct 23 17:46:48 UTC 2021


On 10/22/21 20:09, John Harrison wrote:
> And to be clear, the engine reset is not supposed to fail. Whether 
> issued by GuC or i915, the GDRST register is supposed to self clear 
> according to the bspec. If we are being sent the G2H notification for 
> an engine reset failure then the assumption is that the hardware is 
> broken. This is not a situation that is ever intended to occur in a 
> production system. Therefore, it is not something we should spend huge 
> amounts of effort on making a perfect selftest for.

I don't agree. Selftests are there to verify that assumptions made and 
contracts in the code hold and that hardware behaves as intended / 
assumed. No selftest should ideally trigger in a production driver / 
system. That doesn't mean we can remove all selftests or ignore updating 
them for altered assumptions / contracts. I think it's important here to 
acknowledge the fact that this and the perf selftest have found two 
problems that need consideration for fixing for a production system.

>
> The current theory is that the timeout in GuC is not quite long enough 
> for DG1. Given that the bspec does not specify any kind of timeout, it 
> is only a best guess anyway! Once that has been tuned correctly, we 
> should never hit this case again. Not ever, Not in a selftest, not in 
> an end user use case, just not ever.

..until we introduce new hardware for which the tuning doesn't hold 
anymore or somebody in a two years wants to lower the timeout wondering 
why it was set so long?

/Thomas




More information about the Intel-gfx mailing list