[PATCH 3/3] drm/i915: Fix gt reset with GuC submission disabled
Nirmoy Das
nirmoy.das at intel.com
Fri Apr 19 09:46:50 UTC 2024
Hi John,
On 4/19/2024 1:38 AM, John Harrison wrote:
> On 4/18/2024 10:10, Nirmoy Das wrote:
>> Currently intel_gt_reset() happens as follows:
>>
>> reset_prepare() ---> Sends GDRST to GuC, GuC is in GS_MIA_IN_RESET
>> do_reset()
>> intel_gt_reset_all_engines()
>> *_engine_reset_prepare() -->RESET_CTL expects running GuC
> Not technically correct. There is no direct connection between
> RESET_CTL and GuC.
>
>> *_reset_engines()
>> intel_gt_init_hw() --> GuC comes out of GS_MIA_IN_RESET with FW loaded.
>>
>> Fix the issue by sanitizing the GuC only after resetting requested
>> engines and before intel_gt_init_hw().
> You never actually state what the issue is.
>
> The problem is that there is a dedicated CSB FIFO going to GuC (and
> nothing else has access to it). If that FIFO fills up, the hardware
> will block on the next context switch until there is space. If no-one
> (i.e. GuC) is draining it, that means the system is effectively hung.
> If an engine is reset whilst actively executing a context, a CSB entry
> will be sent to say that the context has gone idle. Thus if you reset
> a very busy system and start with killing GuC before killing the
> engines and only then re-enabling GuC, you run the risk of generating
> more CSB entries than will fit in the FIFO and deadlocking. Whereas,
> if the system is idle then you can reset the engines as much as you
> like while GuC is dead and it won't be a problem.
I wasn't sure if I could talk about internal details so kept it to
minimal. I will borrow above explanation and resend :)
>
>>
>> Note intel_uc_reset_finish() and intel_uc_reset() are nop when
>> guc submission is disabled.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 6504e8ba9c58..bd166f5aca4b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -907,8 +907,17 @@ static intel_engine_mask_t reset_prepare(struct
>> intel_gt *gt)
>> intel_engine_mask_t awake = 0;
>> enum intel_engine_id id;
>> - /* For GuC mode, ensure submission is disabled before stopping
>> ring */
>> - intel_uc_reset_prepare(>->uc);
>> + /**
>> + * For GuC mode with submission enabled, ensure submission
>> + * is disabled before stopping ring.
>> + *
>> + * For GuC mode with submission disabled, ensure that GuC is not
>> + * sanitized, do that at the end in reset_finish(). reset_prepare()
>> + * is followed by engine reset which in this mode requires GuC to
>> + * be functional to process engine reset events.
> -> to process any CSB FIFO entries generated by the resets.
I will add this.
Thanks,
Nirmoy
>
> John.
>
>> + */
>> + if (intel_uc_uses_guc_submission(>->uc))
>> + intel_uc_reset_prepare(>->uc);
>> for_each_engine(engine, gt, id) {
>> if (intel_engine_pm_get_if_awake(engine))
>> @@ -1255,6 +1264,9 @@ void intel_gt_reset(struct intel_gt *gt,
>> intel_overlay_reset(gt->i915);
>> + /* sanitize uC after engine reset */
>> + if (!intel_uc_uses_guc_submission(>->uc))
>> + intel_uc_reset_prepare(>->uc);
>> /*
>> * Next we need to restore the context, but we don't use those
>> * yet either...
>
More information about the Intel-gfx
mailing list