[Intel-gfx] [PATCH] drm/i915/reset: Add Wa_22011802037 for gen11 and execlist backend
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed May 4 08:10:42 UTC 2022
On 03/05/2022 20:49, Umesh Nerlige Ramappa wrote:
> On Tue, May 03, 2022 at 09:42:52AM +0100, Tvrtko Ursulin wrote:
>>
>> On 02/05/2022 23:18, Umesh Nerlige Ramappa wrote:
>>> Current implementation of Wa_22011802037 is limited to the GuC backend
>>> and gen12. Add support for execlist backend and gen11 as well.
>>
>> Is the implication f6aa0d713c88 ("drm/i915: Add Wa_22011802037 force
>> cs halt") does not work on Tigerlake? Fixes: tag probably required in
>> that case since I have sold that fix as a, well, fix.
>
> After the fix was made, the WA has evolved and added some more steps for
> handling pending MI_FORCE_WAKEs. This patch is the additional set of
> steps needed for the WA. As you mentioned offline, I should correct the
> commit message to indicate that the WA does exist for execlists, but
> needs additional steps. Will add Fixes: tag.
Ok, that would be good then since it does sound they need to be tied
together (as in cherry picked for fixes).
Will it be followed up with preempt-to-idle implementation to avoid the,
as I understand it, potential for activity on one CCS engine defeating
the WA on another by timing out the wait for idle?
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/intel_engine.h | 2 +
>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 81 ++++++++++++++++++-
>>> .../drm/i915/gt/intel_execlists_submission.c | 7 ++
>>> .../gpu/drm/i915/gt/intel_ring_submission.c | 7 ++
>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +-
>>> .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ++-----------------
>>> 6 files changed, 103 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h
>>> b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> index 1431f1e9dbee..04e435bce79b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> @@ -201,6 +201,8 @@ int intel_ring_submission_setup(struct
>>> intel_engine_cs *engine);
>>> int intel_engine_stop_cs(struct intel_engine_cs *engine);
>>> void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
>>> +void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs
>>> *engine);
>>> +
>>> void intel_engine_set_hwsp_writemask(struct intel_engine_cs *engine,
>>> u32 mask);
>>> u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index 14c6ddbbfde8..0bda665a407c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1282,10 +1282,10 @@ static int __intel_engine_stop_cs(struct
>>> intel_engine_cs *engine,
>>> intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING));
>>> /*
>>> - * Wa_22011802037 : gen12, Prior to doing a reset, ensure CS is
>>> + * Wa_22011802037 : gen11, gen12, Prior to doing a reset, ensure
>>> CS is
>>> * stopped, set ring stop bit and prefetch disable bit to halt CS
>>> */
>>> - if (GRAPHICS_VER(engine->i915) == 12)
>>> + if (GRAPHICS_VER(engine->i915) == 11 ||
>>> GRAPHICS_VER(engine->i915) == 12)
>>
>> IS_GRAPHICS_VER(11, 12)
>>
>>> intel_uncore_write_fw(uncore,
>>> RING_MODE_GEN7(engine->mmio_base),
>>> _MASKED_BIT_ENABLE(GEN12_GFX_PREFETCH_DISABLE));
>>> @@ -1308,6 +1308,14 @@ int intel_engine_stop_cs(struct
>>> intel_engine_cs *engine)
>>> return -ENODEV;
>>> ENGINE_TRACE(engine, "\n");
>>> + /*
>>> + * TODO: Occasionally trying to stop the cs times out, but does not
>>
>> What is the TODO part? To figure out why is sometimes does not work?
>>
>>> + * adversely affect functionality. The timeout is set as a config
>>> + * parameter that defaults to 100ms. Assuming that this timeout is
>>> + * sufficient for any pending MI_FORCEWAKEs to complete. Once root
>>> + * caused, the caller must check and handler the return from this
>>
>> s/handler/handle/
>>
>> TODO is to convert the function to return an error?
>
> TODO: Find out why occasionally stopping the CS times out. Seen
> especially with gem_eio tests.
>
> I will update the comment to be clear.
This timeout is in general or with this patch only?
>>
>>> + * function.
>>> + */
>>> if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) {
>>> ENGINE_TRACE(engine,
>>> "timed out on STOP_RING -> IDLE; HEAD:%04x,
>>> TAIL:%04x\n",
>>> @@ -1334,6 +1342,75 @@ void intel_engine_cancel_stop_cs(struct
>>> intel_engine_cs *engine)
>>> ENGINE_WRITE_FW(engine, RING_MI_MODE,
>>> _MASKED_BIT_DISABLE(STOP_RING));
>>> }
>>> +static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine)
>>> +{
>>> + static const i915_reg_t _reg[I915_NUM_ENGINES] = {
>>> + [RCS0] = MSG_IDLE_CS,
>>> + [BCS0] = MSG_IDLE_BCS,
>>> + [VCS0] = MSG_IDLE_VCS0,
>>> + [VCS1] = MSG_IDLE_VCS1,
>>> + [VCS2] = MSG_IDLE_VCS2,
>>> + [VCS3] = MSG_IDLE_VCS3,
>>> + [VCS4] = MSG_IDLE_VCS4,
>>> + [VCS5] = MSG_IDLE_VCS5,
>>> + [VCS6] = MSG_IDLE_VCS6,
>>> + [VCS7] = MSG_IDLE_VCS7,
>>> + [VECS0] = MSG_IDLE_VECS0,
>>> + [VECS1] = MSG_IDLE_VECS1,
>>> + [VECS2] = MSG_IDLE_VECS2,
>>> + [VECS3] = MSG_IDLE_VECS3,
>>> + [CCS0] = MSG_IDLE_CS,
>>> + [CCS1] = MSG_IDLE_CS,
>>> + [CCS2] = MSG_IDLE_CS,
>>> + [CCS3] = MSG_IDLE_CS,
>>> + };
>>> + u32 val;
>>> +
>>> + if (!_reg[engine->id].reg)
>>> + return 0;
>>> +
>>> + val = intel_uncore_read(engine->uncore, _reg[engine->id]);
>>> +
>>> + /* bits[29:25] & bits[13:9] >> shift */
>>> + return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT;
>>> +}
>>> +
>>> +static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32
>>> fw_mask)
>>> +{
>>> + int ret;
>>> +
>>> + /* Ensure GPM receives fw up/down after CS is stopped */
>>> + udelay(1);
>>
>> What's with the udealys in here when __intel_wait_for_register_fw
>> already does some waiting?
>
> Once idle, the WA instructs us to wait 1us around checking this status.
> The assumption here is that __intel_wait_for_register_fw could just exit
> as soon as the condition is met by HW.
Okay, that one sounds plausible. But what about the udelay before
__intel_wait_for_register_fw? What difference does it make between:
1)
udelay
loop:
read
break if idle
udelay
2)
loop:
read
break if idle
udelay
Is the read wihtout the udelay dangerous?
Also, why is this doing a 5ms atomic wait? Why not fast-slow to be more
CPU friendly? Does the workaround suggest a typical wait time?
Regards,
Tvrtko
>
> Thanks,
> Umesh
>
>>
>>> +
>>> + /* Wait for forcewake request to complete in GPM */
>>> + ret = __intel_wait_for_register_fw(gt->uncore,
>>> + GEN9_PWRGT_DOMAIN_STATUS,
>>> + fw_mask, fw_mask, 5000, 0, NULL);
>>> +
>>> + /* Ensure CS receives fw ack from GPM */
>>> + udelay(1);
>>> +
>>> + if (ret)
>>> + GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret);
>>> +}
>>> +
>>> +/*
>>> + * Wa_22011802037:gen12: In addition to stopping the cs, we need to
>>> wait for any
>>> + * pending MI_FORCE_WAKEUP requests that the CS has initiated to
>>> complete. The
>>> + * pending status is indicated by bits[13:9] (masked by bits[
>>> 29:25]) in the
>>
>> Extra space in square brackets.
>>
>>> + * MSG_IDLE register. There's one MSG_IDLE register per reset
>>> domain. Since we
>>> + * are concerned only with the gt reset here, we use a logical OR of
>>> pending
>>> + * forcewakeups from all reset domains and then wait for them to
>>> complete by
>>> + * querying PWRGT_DOMAIN_STATUS.
>>> + */
>>> +void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs
>>> *engine)
>>> +{
>>> + u32 fw_pending = __cs_pending_mi_force_wakes(engine);
>>> +
>>> + if (fw_pending)
>>> + __gpm_wait_for_fw_complete(engine->gt, fw_pending);
>>> +}
>>> +
>>> static u32
>>> read_subslice_reg(const struct intel_engine_cs *engine,
>>> int slice, int subslice, i915_reg_t reg)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index 86f7a9ac1c39..e139dc1e44eb 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -2958,6 +2958,13 @@ static void execlists_reset_prepare(struct
>>> intel_engine_cs *engine)
>>> ring_set_paused(engine, 1);
>>> intel_engine_stop_cs(engine);
>>> + /*
>>> + * Wa_22011802037:gen11/gen12: In addition to stopping the cs,
>>> we need
>>> + * to wait for any pending mi force wakeups
>>> + */
>>> + if (GRAPHICS_VER(engine->i915) == 11 ||
>>> GRAPHICS_VER(engine->i915) == 12)
>>> + intel_engine_wait_for_pending_mi_fw(engine);
>>> +
>>> engine->execlists.reset_ccid = active_ccid(engine);
>>> }
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> index 5423bfd301ad..75884e0a552e 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>>> @@ -323,6 +323,13 @@ static void reset_prepare(struct intel_engine_cs
>>> *engine)
>>> ENGINE_TRACE(engine, "\n");
>>> intel_engine_stop_cs(engine);
>>> + /*
>>> + * Wa_22011802037:gen11/gen12: In addition to stopping the cs,
>>> we need
>>> + * to wait for any pending mi force wakeups
>>> + */
>>> + if (GRAPHICS_VER(engine->i915) == 11 ||
>>> GRAPHICS_VER(engine->i915) == 12)
>>
>> IS_GRAPHICS_VER
>>
>>> + intel_engine_wait_for_pending_mi_fw(engine);
>>> +
>>> if (!stop_ring(engine)) {
>>> /* G45 ring initialization often fails to reset head to zero */
>>> ENGINE_TRACE(engine,
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> index 2c4ad4a65089..f69a9464585e 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> @@ -310,8 +310,8 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
>>> if (IS_DG2(gt->i915))
>>> flags |= GUC_WA_DUAL_QUEUE;
>>> - /* Wa_22011802037: graphics version 12 */
>>> - if (GRAPHICS_VER(gt->i915) == 12)
>>> + /* Wa_22011802037: graphics version 11/12 */
>>> + if (GRAPHICS_VER(gt->i915) == 11 || GRAPHICS_VER(gt->i915) == 12)
>>
>> Ditto.
>>
>>> flags |= GUC_WA_PRE_PARSER;
>>> /* Wa_16011777198:dg2 */
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 75291e9846c5..a3fe832eff0d 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -1527,87 +1527,18 @@ static void guc_reset_state(struct
>>> intel_context *ce, u32 head, bool scrub)
>>> lrc_update_regs(ce, engine, head);
>>> }
>>> -static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine)
>>> -{
>>> - static const i915_reg_t _reg[I915_NUM_ENGINES] = {
>>> - [RCS0] = MSG_IDLE_CS,
>>> - [BCS0] = MSG_IDLE_BCS,
>>> - [VCS0] = MSG_IDLE_VCS0,
>>> - [VCS1] = MSG_IDLE_VCS1,
>>> - [VCS2] = MSG_IDLE_VCS2,
>>> - [VCS3] = MSG_IDLE_VCS3,
>>> - [VCS4] = MSG_IDLE_VCS4,
>>> - [VCS5] = MSG_IDLE_VCS5,
>>> - [VCS6] = MSG_IDLE_VCS6,
>>> - [VCS7] = MSG_IDLE_VCS7,
>>> - [VECS0] = MSG_IDLE_VECS0,
>>> - [VECS1] = MSG_IDLE_VECS1,
>>> - [VECS2] = MSG_IDLE_VECS2,
>>> - [VECS3] = MSG_IDLE_VECS3,
>>> - [CCS0] = MSG_IDLE_CS,
>>> - [CCS1] = MSG_IDLE_CS,
>>> - [CCS2] = MSG_IDLE_CS,
>>> - [CCS3] = MSG_IDLE_CS,
>>> - };
>>> - u32 val;
>>> -
>>> - if (!_reg[engine->id].reg)
>>> - return 0;
>>> -
>>> - val = intel_uncore_read(engine->uncore, _reg[engine->id]);
>>> -
>>> - /* bits[29:25] & bits[13:9] >> shift */
>>> - return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT;
>>> -}
>>> -
>>> -static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32
>>> fw_mask)
>>> -{
>>> - int ret;
>>> -
>>> - /* Ensure GPM receives fw up/down after CS is stopped */
>>> - udelay(1);
>>> -
>>> - /* Wait for forcewake request to complete in GPM */
>>> - ret = __intel_wait_for_register_fw(gt->uncore,
>>> - GEN9_PWRGT_DOMAIN_STATUS,
>>> - fw_mask, fw_mask, 5000, 0, NULL);
>>> -
>>> - /* Ensure CS receives fw ack from GPM */
>>> - udelay(1);
>>> -
>>> - if (ret)
>>> - GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret);
>>> -}
>>> -
>>> -/*
>>> - * Wa_22011802037:gen12: In addition to stopping the cs, we need to
>>> wait for any
>>> - * pending MI_FORCE_WAKEUP requests that the CS has initiated to
>>> complete. The
>>> - * pending status is indicated by bits[13:9] (masked by bits[
>>> 29:25]) in the
>>> - * MSG_IDLE register. There's one MSG_IDLE register per reset
>>> domain. Since we
>>> - * are concerned only with the gt reset here, we use a logical OR of
>>> pending
>>> - * forcewakeups from all reset domains and then wait for them to
>>> complete by
>>> - * querying PWRGT_DOMAIN_STATUS.
>>> - */
>>> static void guc_engine_reset_prepare(struct intel_engine_cs *engine)
>>> {
>>> - u32 fw_pending;
>>> -
>>> - if (GRAPHICS_VER(engine->i915) != 12)
>>> + if (GRAPHICS_VER(engine->i915) != 11 &&
>>> GRAPHICS_VER(engine->i915) != 12)
>>
>> !IS_GRAPHICS_VER
>>
>>> return;
>>> - /*
>>> - * Wa_22011802037
>>> - * TODO: Occasionally trying to stop the cs times out, but does not
>>> - * adversely affect functionality. The timeout is set as a config
>>> - * parameter that defaults to 100ms. Assuming that this timeout is
>>> - * sufficient for any pending MI_FORCEWAKEs to complete, ignore the
>>> - * timeout returned here until it is root caused.
>>> - */
>>> intel_engine_stop_cs(engine);
>>> - fw_pending = __cs_pending_mi_force_wakes(engine);
>>> - if (fw_pending)
>>> - __gpm_wait_for_fw_complete(engine->gt, fw_pending);
>>> + /*
>>> + * Wa_22011802037:gen11/gen12: In addition to stopping the cs,
>>> we need
>>> + * to wait for any pending mi force wakeups
>>> + */
>>> + intel_engine_wait_for_pending_mi_fw(engine);
>>> }
>>> static void guc_reset_nop(struct intel_engine_cs *engine)
>>
>> Regards,
>>
>> Tvrtko
More information about the Intel-gfx
mailing list