[Intel-gfx] [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

Thomas Hellström thomas.hellstrom at linux.intel.com
Sat Oct 23 18:36:47 UTC 2021


On 10/23/21 20:18, Matthew Brost wrote:
> On Sat, Oct 23, 2021 at 07:46:48PM +0200, Thomas Hellström wrote:
>> On 10/22/21 20:09, John Harrison wrote:
>>> And to be clear, the engine reset is not supposed to fail. Whether
>>> issued by GuC or i915, the GDRST register is supposed to self clear
>>> according to the bspec. If we are being sent the G2H notification for an
>>> engine reset failure then the assumption is that the hardware is broken.
>>> This is not a situation that is ever intended to occur in a production
>>> system. Therefore, it is not something we should spend huge amounts of
>>> effort on making a perfect selftest for.
>> I don't agree. Selftests are there to verify that assumptions made and
>> contracts in the code hold and that hardware behaves as intended / assumed.
>> No selftest should ideally trigger in a production driver / system. That
>> doesn't mean we can remove all selftests or ignore updating them for altered
>> assumptions / contracts. I think it's important here to acknowledge the fact
>> that this and the perf selftest have found two problems that need
>> consideration for fixing for a production system.
>>
> I'm confused - we are going down the rabbit hole here.
>
> Back to this patch. This test was written for very specific execlists
> behavior. It was updated to also support the GuC. In that update we
> missed fixing the failure path, well because it always passed. Now it
> has failed, we see that it doesn't fail gracefully, and takes down the
> machine. This patch fixes that. It also openned my eyes to the horror
> show reset locking that needs to be fixed long term.

Well the email above wasn't really about the correctness of this 
particular patch (I should probably have altered the subject to reflect 
that) but rather about the assumption that failures that should never 
occur in a production system are not worth spending time on selftests for.

For the patch itself, I'll take a deeper look at the patch and get back.

/Thomas




More information about the Intel-gfx mailing list