[Intel-gfx] [PATCH 1/2] drm/i915: limit double GT reset to pre-MTL

Ceraolo Spurio, Daniele daniele.ceraolospurio at intel.com
Wed Mar 22 21:03:20 UTC 2023



On 3/22/2023 12:44 PM, John Harrison wrote:
> On 3/20/2023 14:10, Daniele Ceraolo Spurio wrote:
>> Commit 3db9d590557d ("drm/i915/gt: Reset twice") modified the code to
>> always hit the GDRST register twice when doing a reset, with the
>> reported aim to fix invalid post-reset engine state on some platforms
>> (Jasperlake being the only one actually mentioned).
> It still concerns me that there are no actual details about this issue 
> from a hardware perspective as to what/why it goes wrong, the comment 
> is fully of non-definitive language - 'appears to', 'should', 'is 
> still a concern that'. And there is no w/a number associated with it. 
> It all feels extremely suspect and warrants a great big FIXME tag as a 
> minimum.

I agree that the whole thing is unclear and we could add a FIXME, but 
IMO that is outside the scope of this patch as I'm not adding the code 
in question. This should be discussed with the original author/reviewers.

Daniele

>
> John.
>
>
>>
>> This is a problem on MTL, due to the fact that we have to apply a time
>> consuming WA (coming in the next patch) every time we hit the GDRST
>> register in a way that can include the GSC engine. Even post MTL, the
>> expectation is that we'll have some work to do before and after hitting
>> the GDRST if the GSC is involved.
>>
>> Since the issue requiring the double reset seems to be limited to older
>> platforms, instead of trying to handle the double-reset on MTL and
>> future platforms it is just easier to turn it off. The default on MTL is
>> also for GuC to own engine reset, with i915 only covering full-GT reset.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Andi Shyti <andi.shyti at linux.intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> Cc: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_reset.c | 35 +++++++++++++++------------
>>   1 file changed, 20 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 0bb9094fdacd..2c3463f77e5c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -268,9 +268,27 @@ static int ilk_do_reset(struct intel_gt *gt, 
>> intel_engine_mask_t engine_mask,
>>   static int gen6_hw_domain_reset(struct intel_gt *gt, u32 
>> hw_domain_mask)
>>   {
>>       struct intel_uncore *uncore = gt->uncore;
>> -    int loops = 2;
>> +    int loops;
>>       int err;
>>   +    /*
>> +     * On some platforms, e.g. Jasperlake, we see that the engine 
>> register
>> +     * state is not cleared until shortly after GDRST reports 
>> completion,
>> +     * causing a failure as we try to immediately resume while the 
>> internal
>> +     * state is still in flux. If we immediately repeat the reset, the
>> +     * second reset appears to serialise with the first, and since 
>> it is a
>> +     * no-op, the registers should retain their reset value. 
>> However, there
>> +     * is still a concern that upon leaving the second reset, the 
>> internal
>> +     * engine state is still in flux and not ready for resuming.
>> +     *
>> +     * Starting on MTL, there are some prep steps that we need to do 
>> when
>> +     * resetting some engines that need to be applied every time we 
>> write to
>> +     * GEN6_GDRST. As those are time consuming (tens of ms), we 
>> don't want
>> +     * to perform that twice, so, since the Jasperlake issue hasn't 
>> been
>> +     * observed on MTL, we avoid repeating the reset on newer 
>> platforms.
>> +     */
>> +    loops = GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70) ? 2 : 1;
>> +
>>       /*
>>        * GEN6_GDRST is not in the gt power well, no need to check
>>        * for fifo space for the write or forcewake the chip for
>> @@ -279,20 +297,7 @@ static int gen6_hw_domain_reset(struct intel_gt 
>> *gt, u32 hw_domain_mask)
>>       do {
>>           intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
>>   -        /*
>> -         * Wait for the device to ack the reset requests.
>> -         *
>> -         * On some platforms, e.g. Jasperlake, we see that the
>> -         * engine register state is not cleared until shortly after
>> -         * GDRST reports completion, causing a failure as we try
>> -         * to immediately resume while the internal state is still
>> -         * in flux. If we immediately repeat the reset, the second
>> -         * reset appears to serialise with the first, and since
>> -         * it is a no-op, the registers should retain their reset
>> -         * value. However, there is still a concern that upon
>> -         * leaving the second reset, the internal engine state
>> -         * is still in flux and not ready for resuming.
>> -         */
>> +        /* Wait for the device to ack the reset requests. */
>>           err = __intel_wait_for_register_fw(uncore, GEN6_GDRST,
>>                              hw_domain_mask, 0,
>>                              2000, 0,
>



More information about the Intel-gfx mailing list