[Intel-gfx] [PATCH 1/3] drm/i915/guc: Temporarily bump the GuC load timeout

Matthew Brost matthew.brost at intel.com
Tue Dec 21 01:13:04 UTC 2021


On Mon, Dec 20, 2021 at 04:52:19PM -0800, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> There is a known (but exceedingly unlikely) race condition where the
> asynchronous frequency management code could reduce the GT clock while
> a GuC reload is in progress (during a full GT reset). A fix is in
> progress but there are complex locking issues to be resolved. In the
> meantime bump the timeout to 500ms. Even at slowest clock, this
> should be sufficient. And in the working case, a larger timeout makes
> no difference.
> 
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>

Any idea of the ETA for the proper fix? Also if the proper fix makes the
locking more complicated I'm probably of the opinion we just live with a
longer timer as full GTs shouldn't really ever happen in practice and if
they take a longer time, so be it.

Anyways for this patch:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> index 31420ce1ce6b..c03bde5ec61f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> @@ -105,12 +105,21 @@ static int guc_wait_ucode(struct intel_uncore *uncore)
>  	/*
>  	 * Wait for the GuC to start up.
>  	 * NB: Docs recommend not using the interrupt for completion.
> -	 * Measurements indicate this should take no more than 20ms, so a
> +	 * Measurements indicate this should take no more than 20ms
> +	 * (assuming the GT clock is at maximum frequency). So, a
>  	 * timeout here indicates that the GuC has failed and is unusable.
>  	 * (Higher levels of the driver may decide to reset the GuC and
>  	 * attempt the ucode load again if this happens.)
> +	 *
> +	 * FIXME: There is a known (but exceedingly unlikely) race condition
> +	 * where the asynchronous frequency management code could reduce
> +	 * the GT clock while a GuC reload is in progress (during a full
> +	 * GT reset). A fix is in progress but there are complex locking
> +	 * issues to be resolved. In the meantime bump the timeout to
> +	 * 500ms. Even at slowest clock, this should be sufficient. And
> +	 * in the working case, a larger timeout makes no difference.
>  	 */
> -	ret = wait_for(guc_ready(uncore, &status), 100);
> +	ret = wait_for(guc_ready(uncore, &status), 500);
>  	if (ret) {
>  		struct drm_device *drm = &uncore->i915->drm;
>  
> -- 
> 2.25.1
> 


More information about the dri-devel mailing list