[Intel-gfx] [PATCH] drm/i915/reset: Add Wa_22011802037 for gen11 and execlist backend

Fri May 6 17:13:28 UTC 2022

On Wed, May 04, 2022 at 07:09:09PM +0100, Tvrtko Ursulin wrote:
>
>On 04/05/2022 18:35, Umesh Nerlige Ramappa wrote:
>>On Wed, May 04, 2022 at 09:10:42AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 03/05/2022 20:49, Umesh Nerlige Ramappa wrote:
>>>>On Tue, May 03, 2022 at 09:42:52AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 02/05/2022 23:18, Umesh Nerlige Ramappa wrote:
>>>>>>Current implementation of Wa_22011802037 is limited to the GuC backend
>>>>>>and gen12. Add support for execlist backend and gen11 as well.
>>>>>
>>>>>Is the implication f6aa0d713c88 ("drm/i915: Add Wa_22011802037 
>>>>>force cs halt") does not work on Tigerlake? Fixes: tag 
>>>>>probably required in that case since I have sold that fix as 
>>>>>a, well, fix.
>>>>
>>>>After the fix was made, the WA has evolved and added some more 
>>>>steps for handling pending MI_FORCE_WAKEs. This patch is the 
>>>>additional set of steps needed for the WA. As you mentioned 
>>>>offline, I should correct the commit message to indicate that 
>>>>the WA does exist for execlists, but needs additional steps. 
>>>>Will add Fixes: tag.
>>>
>>>Ok, that would be good then since it does sound they need to be 
>>>tied together (as in cherry picked for fixes).
>>>
>>>Will it be followed up with preempt-to-idle implementation to 
>>>avoid the, as I understand it, potential for activity on one CCS 
>>>engine defeating the WA on another by timing out the wait for 
>>>idle?
>>
>>fwiu, for the case where we want to limit the reset to a single 
>>engine, the preempt-to-idle implementation may be required - 
>>https://patchwork.freedesktop.org/series/101432/. If preempt-to-idle 
>>fails, the hangcheck should kick in and then do a gt-reset. If that 
>>happens, then the WA flow in the patch should be applied.
>
>Okay I read that as yes. That is fine by me since this patch alone is 
>better than without it.
>

I have a general doubt for engines that do NOT share a reset domain, 
specifically for execlist backend.

What is the expectation/behavior with the hangcheck initiated reset. It 
says resetting chip for the engine that it decides is hung. In that path 
it calls gt_reset which loops through engines (reset_prepare, rewind, 
etc.).  Are all running contexts victimized? OR is there an attempt to 
preempt-to-idle contexts on other (innocent) engines and then resubmit 
them if successfully preempted?

Thanks,
Umesh