[Intel-gfx] [PATCH 1/3] drm/i915/guc: Temporarily bump the GuC load timeout
Matthew Brost
matthew.brost at intel.com
Tue Dec 21 01:13:04 UTC 2021
On Mon, Dec 20, 2021 at 04:52:19PM -0800, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> There is a known (but exceedingly unlikely) race condition where the
> asynchronous frequency management code could reduce the GT clock while
> a GuC reload is in progress (during a full GT reset). A fix is in
> progress but there are complex locking issues to be resolved. In the
> meantime bump the timeout to 500ms. Even at slowest clock, this
> should be sufficient. And in the working case, a larger timeout makes
> no difference.
>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
Any idea of the ETA for the proper fix? Also if the proper fix makes the
locking more complicated I'm probably of the opinion we just live with a
longer timer as full GTs shouldn't really ever happen in practice and if
they take a longer time, so be it.
Anyways for this patch:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> ---
> drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> index 31420ce1ce6b..c03bde5ec61f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> @@ -105,12 +105,21 @@ static int guc_wait_ucode(struct intel_uncore *uncore)
> /*
> * Wait for the GuC to start up.
> * NB: Docs recommend not using the interrupt for completion.
> - * Measurements indicate this should take no more than 20ms, so a
> + * Measurements indicate this should take no more than 20ms
> + * (assuming the GT clock is at maximum frequency). So, a
> * timeout here indicates that the GuC has failed and is unusable.
> * (Higher levels of the driver may decide to reset the GuC and
> * attempt the ucode load again if this happens.)
> + *
> + * FIXME: There is a known (but exceedingly unlikely) race condition
> + * where the asynchronous frequency management code could reduce
> + * the GT clock while a GuC reload is in progress (during a full
> + * GT reset). A fix is in progress but there are complex locking
> + * issues to be resolved. In the meantime bump the timeout to
> + * 500ms. Even at slowest clock, this should be sufficient. And
> + * in the working case, a larger timeout makes no difference.
> */
> - ret = wait_for(guc_ready(uncore, &status), 100);
> + ret = wait_for(guc_ready(uncore, &status), 500);
> if (ret) {
> struct drm_device *drm = &uncore->i915->drm;
>
> --
> 2.25.1
>
More information about the dri-devel
mailing list