[Intel-gfx] [PATCH] drm/i915/reset: Add Wa_22011802037 for gen11 and execlist backend

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Fri May 6 21:08:37 UTC 2022


On Fri, May 06, 2022 at 10:13:28AM -0700, Umesh Nerlige Ramappa wrote:
>On Wed, May 04, 2022 at 07:09:09PM +0100, Tvrtko Ursulin wrote:
>>
>>On 04/05/2022 18:35, Umesh Nerlige Ramappa wrote:
>>>On Wed, May 04, 2022 at 09:10:42AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>>On 03/05/2022 20:49, Umesh Nerlige Ramappa wrote:
>>>>>On Tue, May 03, 2022 at 09:42:52AM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>>On 02/05/2022 23:18, Umesh Nerlige Ramappa wrote:
>>>>>>>Current implementation of Wa_22011802037 is limited to the GuC backend
>>>>>>>and gen12. Add support for execlist backend and gen11 as well.
>>>>>>
>>>>>>Is the implication f6aa0d713c88 ("drm/i915: Add 
>>>>>>Wa_22011802037 force cs halt") does not work on Tigerlake? 
>>>>>>Fixes: tag probably required in that case since I have sold 
>>>>>>that fix as a, well, fix.
>>>>>
>>>>>After the fix was made, the WA has evolved and added some more 
>>>>>steps for handling pending MI_FORCE_WAKEs. This patch is the 
>>>>>additional set of steps needed for the WA. As you mentioned 
>>>>>offline, I should correct the commit message to indicate that 
>>>>>the WA does exist for execlists, but needs additional steps. 
>>>>>Will add Fixes: tag.
>>>>
>>>>Ok, that would be good then since it does sound they need to be 
>>>>tied together (as in cherry picked for fixes).
>>>>
>>>>Will it be followed up with preempt-to-idle implementation to 
>>>>avoid the, as I understand it, potential for activity on one CCS 
>>>>engine defeating the WA on another by timing out the wait for 
>>>>idle?
>>>
>>>fwiu, for the case where we want to limit the reset to a single 
>>>engine, the preempt-to-idle implementation may be required - 
>>>https://patchwork.freedesktop.org/series/101432/. If 
>>>preempt-to-idle fails, the hangcheck should kick in and then do a 
>>>gt-reset. If that happens, then the WA flow in the patch should be 
>>>applied.
>>
>>Okay I read that as yes. That is fine by me since this patch alone 
>>is better than without it.
>>
>
>I have a general doubt for engines that do NOT share a reset domain, 
>specifically for execlist backend.
>
>What is the expectation/behavior with the hangcheck initiated reset. 
>It says resetting chip for the engine that it decides is hung. In that 
>path it calls gt_reset which loops through engines (reset_prepare, 
>rewind, etc.).  Are all running contexts victimized? OR is there an 
>attempt to preempt-to-idle contexts on other (innocent) engines and 
>then resubmit them if successfully preempted?

nvm, I notice that all active contexts are marked as guilty, which is 
what I was expecting. I think I ran into a test bug when trying to 
understand this behavior, so wanted to ask. The test was running more 
than one batch on the targeted engine and checking that both batches are 
guilty, but that cannot happen, only the active contexts are marked 
guilty.

Regards,
Umesh
>
>Thanks,
>Umesh


More information about the Intel-gfx mailing list