[Intel-gfx] [PATCH] drm/i915/gsc: Fix the Driver-FLR completion

Ceraolo Spurio, Daniele daniele.ceraolospurio at intel.com
Thu Feb 23 23:49:13 UTC 2023



On 2/22/2023 1:01 PM, Alan Previn wrote:
> The Driver-FLR flow may inadvertently exit early before the full
> completion of the re-init of the internal HW state if we only poll
> GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> we need a two-step completion wait-for-completion flow that also
> involves GU_CNTL. See the patch and new code comments for detail.
> This is new direction from HW architecture folks.
>
>     v2: - Add error message for the teardown timeout (Anshuman)
>         - Don't duplicate code in comments (Jani)
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
> Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")

I'm not sure if we need a fixes tag, given that this is MTL specific 
code and that's still under force probe.

> ---
>   drivers/gpu/drm/i915/intel_uncore.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f018da7ebaac..f3c46352db89 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
>   	/* Trigger the actual Driver-FLR */
>   	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
>   
> +	/* Wait for hardware teardown to complete */
> +	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> +					 DRIVERFLR_STATUS, 0,

shouldn't this bit be DRIVERFLR instead of DRIVERFLR_STATUS ? I know 
they're both BIT(31), but DRIVERFLR_STATUS is the definition for the 
GU_DEBUG bit, while this wait is on GU_CNTL.

> +					 flr_timeout_ms);
> +	if (ret) {
> +		drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! %d\n", ret);
> +		return;
> +	}
> +
> +	/* Wait for hardware/firmware re-init to complete */
>   	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
>   					 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
>   					 flr_timeout_ms);

I was wondering if we could reduce the timing here to avoid 2 waits of 3 
seconds, as the 3 seconds should be for the full process. However, the 
specs don't say how much each step can take, so I agree that to be safe 
is better to have both timeouts at 3 seconds. If the FLR fails the HW is 
toast anyway, so waiting a few seconds more to detect it on driver 
unload is not going to have additional consequences that we wouldn't 
already have.

With the bit in the wait above fixed:
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

Daniele

>   	if (ret) {
> -		drm_err(&i915->drm, "wait for Driver-FLR completion failed! %d\n", ret);
> +		drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! %d\n", ret);
>   		return;
>   	}
>   
> +	/* Clear sticky completion status */
>   	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
>   }
>   



More information about the Intel-gfx mailing list