[PATCH 3/3] drm/i915: Fix gt reset with GuC submission disabled

Nirmoy Das nirmoy.das at intel.com
Fri Apr 19 09:46:50 UTC 2024


Hi John,

On 4/19/2024 1:38 AM, John Harrison wrote:
> On 4/18/2024 10:10, Nirmoy Das wrote:
>> Currently intel_gt_reset() happens as follows:
>>
>> reset_prepare() ---> Sends GDRST to GuC, GuC is in GS_MIA_IN_RESET
>> do_reset()
>>    intel_gt_reset_all_engines()
>>      *_engine_reset_prepare() -->RESET_CTL expects running GuC
> Not technically correct. There is no direct connection between 
> RESET_CTL and GuC.
>
>>      *_reset_engines()
>> intel_gt_init_hw() --> GuC comes out of GS_MIA_IN_RESET with FW loaded.
>>
>> Fix the issue by sanitizing the GuC only after resetting requested
>> engines and before intel_gt_init_hw().
> You never actually state what the issue is.
>
> The problem is that there is a dedicated CSB FIFO going to GuC (and 
> nothing else has access to it). If that FIFO fills up, the hardware 
> will block on the next context switch until there is space. If no-one 
> (i.e. GuC) is draining it, that means the system is effectively hung. 
> If an engine is reset whilst actively executing a context, a CSB entry 
> will be sent to say that the context has gone idle. Thus if you reset 
> a very busy system and start with killing GuC before killing the 
> engines and only then re-enabling GuC, you run the risk of generating 
> more CSB entries than will fit in the FIFO and deadlocking. Whereas, 
> if the system is idle then you can reset the engines as much as you 
> like while GuC is dead and it won't be a problem.


I wasn't sure if I could talk about internal details so kept it to 
minimal. I will borrow above explanation and resend :)

>
>>
>> Note intel_uc_reset_finish() and intel_uc_reset() are nop when
>> guc submission is disabled.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 6504e8ba9c58..bd166f5aca4b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -907,8 +907,17 @@ static intel_engine_mask_t reset_prepare(struct 
>> intel_gt *gt)
>>       intel_engine_mask_t awake = 0;
>>       enum intel_engine_id id;
>>   -    /* For GuC mode, ensure submission is disabled before stopping 
>> ring */
>> -    intel_uc_reset_prepare(&gt->uc);
>> +    /**
>> +     * For GuC mode with submission enabled, ensure submission
>> +     * is disabled before stopping ring.
>> +     *
>> +     * For GuC mode with submission disabled, ensure that GuC is not
>> +     * sanitized, do that at the end in reset_finish(). reset_prepare()
>> +     * is followed by engine reset which in this mode requires GuC to
>> +     * be functional to process engine reset events.
> -> to process any CSB FIFO entries generated by the resets.

I will add this.


Thanks,

Nirmoy

>
> John.
>
>> +     */
>> +    if (intel_uc_uses_guc_submission(&gt->uc))
>> +        intel_uc_reset_prepare(&gt->uc);
>>         for_each_engine(engine, gt, id) {
>>           if (intel_engine_pm_get_if_awake(engine))
>> @@ -1255,6 +1264,9 @@ void intel_gt_reset(struct intel_gt *gt,
>>         intel_overlay_reset(gt->i915);
>>   +    /* sanitize uC after engine reset */
>> +    if (!intel_uc_uses_guc_submission(&gt->uc))
>> +        intel_uc_reset_prepare(&gt->uc);
>>       /*
>>        * Next we need to restore the context, but we don't use those
>>        * yet either...
>


More information about the Intel-gfx mailing list