[Intel-gfx] [PATCH] drm/i915/reset: Add Wa_22011802037 for gen11 and execlist backend
Umesh Nerlige Ramappa
umesh.nerlige.ramappa at intel.com
Fri May 6 17:13:28 UTC 2022
On Wed, May 04, 2022 at 07:09:09PM +0100, Tvrtko Ursulin wrote:
>
>On 04/05/2022 18:35, Umesh Nerlige Ramappa wrote:
>>On Wed, May 04, 2022 at 09:10:42AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 03/05/2022 20:49, Umesh Nerlige Ramappa wrote:
>>>>On Tue, May 03, 2022 at 09:42:52AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 02/05/2022 23:18, Umesh Nerlige Ramappa wrote:
>>>>>>Current implementation of Wa_22011802037 is limited to the GuC backend
>>>>>>and gen12. Add support for execlist backend and gen11 as well.
>>>>>
>>>>>Is the implication f6aa0d713c88 ("drm/i915: Add Wa_22011802037
>>>>>force cs halt") does not work on Tigerlake? Fixes: tag
>>>>>probably required in that case since I have sold that fix as
>>>>>a, well, fix.
>>>>
>>>>After the fix was made, the WA has evolved and added some more
>>>>steps for handling pending MI_FORCE_WAKEs. This patch is the
>>>>additional set of steps needed for the WA. As you mentioned
>>>>offline, I should correct the commit message to indicate that
>>>>the WA does exist for execlists, but needs additional steps.
>>>>Will add Fixes: tag.
>>>
>>>Ok, that would be good then since it does sound they need to be
>>>tied together (as in cherry picked for fixes).
>>>
>>>Will it be followed up with preempt-to-idle implementation to
>>>avoid the, as I understand it, potential for activity on one CCS
>>>engine defeating the WA on another by timing out the wait for
>>>idle?
>>
>>fwiu, for the case where we want to limit the reset to a single
>>engine, the preempt-to-idle implementation may be required -
>>https://patchwork.freedesktop.org/series/101432/. If preempt-to-idle
>>fails, the hangcheck should kick in and then do a gt-reset. If that
>>happens, then the WA flow in the patch should be applied.
>
>Okay I read that as yes. That is fine by me since this patch alone is
>better than without it.
>
I have a general doubt for engines that do NOT share a reset domain,
specifically for execlist backend.
What is the expectation/behavior with the hangcheck initiated reset. It
says resetting chip for the engine that it decides is hung. In that path
it calls gt_reset which loops through engines (reset_prepare, rewind,
etc.). Are all running contexts victimized? OR is there an attempt to
preempt-to-idle contexts on other (innocent) engines and then resubmit
them if successfully preempted?
Thanks,
Umesh
More information about the Intel-gfx
mailing list