[Intel-gfx] [PATCH] drm/i915/gsc: Fix the Driver-FLR completion
Ceraolo Spurio, Daniele
daniele.ceraolospurio at intel.com
Thu Feb 23 23:49:13 UTC 2023
On 2/22/2023 1:01 PM, Alan Previn wrote:
> The Driver-FLR flow may inadvertently exit early before the full
> completion of the re-init of the internal HW state if we only poll
> GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> we need a two-step completion wait-for-completion flow that also
> involves GU_CNTL. See the patch and new code comments for detail.
> This is new direction from HW architecture folks.
>
> v2: - Add error message for the teardown timeout (Anshuman)
> - Don't duplicate code in comments (Jani)
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
> Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
I'm not sure if we need a fixes tag, given that this is MTL specific
code and that's still under force probe.
> ---
> drivers/gpu/drm/i915/intel_uncore.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f018da7ebaac..f3c46352db89 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
> /* Trigger the actual Driver-FLR */
> intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
>
> + /* Wait for hardware teardown to complete */
> + ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> + DRIVERFLR_STATUS, 0,
shouldn't this bit be DRIVERFLR instead of DRIVERFLR_STATUS ? I know
they're both BIT(31), but DRIVERFLR_STATUS is the definition for the
GU_DEBUG bit, while this wait is on GU_CNTL.
> + flr_timeout_ms);
> + if (ret) {
> + drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! %d\n", ret);
> + return;
> + }
> +
> + /* Wait for hardware/firmware re-init to complete */
> ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
> DRIVERFLR_STATUS, DRIVERFLR_STATUS,
> flr_timeout_ms);
I was wondering if we could reduce the timing here to avoid 2 waits of 3
seconds, as the 3 seconds should be for the full process. However, the
specs don't say how much each step can take, so I agree that to be safe
is better to have both timeouts at 3 seconds. If the FLR fails the HW is
toast anyway, so waiting a few seconds more to detect it on driver
unload is not going to have additional consequences that we wouldn't
already have.
With the bit in the wait above fixed:
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
Daniele
> if (ret) {
> - drm_err(&i915->drm, "wait for Driver-FLR completion failed! %d\n", ret);
> + drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! %d\n", ret);
> return;
> }
>
> + /* Clear sticky completion status */
> intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
> }
>
More information about the Intel-gfx
mailing list