[Intel-gfx] [PATCH 1/2] drm/i915: limit double GT reset to pre-MTL
John Harrison
john.c.harrison at intel.com
Wed Mar 22 19:44:56 UTC 2023
On 3/20/2023 14:10, Daniele Ceraolo Spurio wrote:
> Commit 3db9d590557d ("drm/i915/gt: Reset twice") modified the code to
> always hit the GDRST register twice when doing a reset, with the
> reported aim to fix invalid post-reset engine state on some platforms
> (Jasperlake being the only one actually mentioned).
It still concerns me that there are no actual details about this issue
from a hardware perspective as to what/why it goes wrong, the comment is
fully of non-definitive language - 'appears to', 'should', 'is still a
concern that'. And there is no w/a number associated with it. It all
feels extremely suspect and warrants a great big FIXME tag as a minimum.
John.
>
> This is a problem on MTL, due to the fact that we have to apply a time
> consuming WA (coming in the next patch) every time we hit the GDRST
> register in a way that can include the GSC engine. Even post MTL, the
> expectation is that we'll have some work to do before and after hitting
> the GDRST if the GSC is involved.
>
> Since the issue requiring the double reset seems to be limited to older
> platforms, instead of trying to handle the double-reset on MTL and
> future platforms it is just easier to turn it off. The default on MTL is
> also for GuC to own engine reset, with i915 only covering full-GT reset.
>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_reset.c | 35 +++++++++++++++------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 0bb9094fdacd..2c3463f77e5c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -268,9 +268,27 @@ static int ilk_do_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask,
> static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
> {
> struct intel_uncore *uncore = gt->uncore;
> - int loops = 2;
> + int loops;
> int err;
>
> + /*
> + * On some platforms, e.g. Jasperlake, we see that the engine register
> + * state is not cleared until shortly after GDRST reports completion,
> + * causing a failure as we try to immediately resume while the internal
> + * state is still in flux. If we immediately repeat the reset, the
> + * second reset appears to serialise with the first, and since it is a
> + * no-op, the registers should retain their reset value. However, there
> + * is still a concern that upon leaving the second reset, the internal
> + * engine state is still in flux and not ready for resuming.
> + *
> + * Starting on MTL, there are some prep steps that we need to do when
> + * resetting some engines that need to be applied every time we write to
> + * GEN6_GDRST. As those are time consuming (tens of ms), we don't want
> + * to perform that twice, so, since the Jasperlake issue hasn't been
> + * observed on MTL, we avoid repeating the reset on newer platforms.
> + */
> + loops = GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70) ? 2 : 1;
> +
> /*
> * GEN6_GDRST is not in the gt power well, no need to check
> * for fifo space for the write or forcewake the chip for
> @@ -279,20 +297,7 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
> do {
> intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
>
> - /*
> - * Wait for the device to ack the reset requests.
> - *
> - * On some platforms, e.g. Jasperlake, we see that the
> - * engine register state is not cleared until shortly after
> - * GDRST reports completion, causing a failure as we try
> - * to immediately resume while the internal state is still
> - * in flux. If we immediately repeat the reset, the second
> - * reset appears to serialise with the first, and since
> - * it is a no-op, the registers should retain their reset
> - * value. However, there is still a concern that upon
> - * leaving the second reset, the internal engine state
> - * is still in flux and not ready for resuming.
> - */
> + /* Wait for the device to ack the reset requests. */
> err = __intel_wait_for_register_fw(uncore, GEN6_GDRST,
> hw_domain_mask, 0,
> 2000, 0,
More information about the Intel-gfx
mailing list