[Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest
Thomas Hellström
thomas.hellstrom at linux.intel.com
Sat Oct 23 17:46:48 UTC 2021
On 10/22/21 20:09, John Harrison wrote:
> And to be clear, the engine reset is not supposed to fail. Whether
> issued by GuC or i915, the GDRST register is supposed to self clear
> according to the bspec. If we are being sent the G2H notification for
> an engine reset failure then the assumption is that the hardware is
> broken. This is not a situation that is ever intended to occur in a
> production system. Therefore, it is not something we should spend huge
> amounts of effort on making a perfect selftest for.
I don't agree. Selftests are there to verify that assumptions made and
contracts in the code hold and that hardware behaves as intended /
assumed. No selftest should ideally trigger in a production driver /
system. That doesn't mean we can remove all selftests or ignore updating
them for altered assumptions / contracts. I think it's important here to
acknowledge the fact that this and the perf selftest have found two
problems that need consideration for fixing for a production system.
>
> The current theory is that the timeout in GuC is not quite long enough
> for DG1. Given that the bspec does not specify any kind of timeout, it
> is only a best guess anyway! Once that has been tuned correctly, we
> should never hit this case again. Not ever, Not in a selftest, not in
> an end user use case, just not ever.
..until we introduce new hardware for which the tuning doesn't hold
anymore or somebody in a two years wants to lower the timeout wondering
why it was set so long?
/Thomas
More information about the Intel-gfx
mailing list