[Intel-gfx] [RFC PATCH] drm/i915/gt: Do not treat MCR locking timeouts as errors

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Oct 4 10:49:48 UTC 2023


On 04/10/2023 10:43, Andi Shyti wrote:
> The MCR steering semaphore is a shared lock entry between i915
> and various firmware components.
> 
> Getting the lock might sinchronize on some shared resources.
> Sometimes though, it might happen that the firmware forgets to
> unlock causing unnecessary noise in the driver which keeps doing
> what was supposed to do, ignoring the problem.
> 
> Do not consider this failure as an error, but just print a debug
> message stating that the MCR locking has been skipped.
> 
> On the driver side we still have spinlocks that make sure that
> the access to the resources is serialized.
> 
> Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Cc: Nirmoy Das <nirmoy.das at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> index 326c2ed1d99b..51eb693df39b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> @@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
>   	 * would indicate some hardware/firmware is misbehaving and not
>   	 * releasing it properly.
>   	 */
> -	if (err == -ETIMEDOUT) {
> -		gt_err_ratelimited(gt, "hardware MCR steering semaphore timed out");
> -		add_taint_for_CI(gt->i915, TAINT_WARN);  /* CI is now unreliable */
> -	}
> +	if (err == -ETIMEDOUT)
> +		gt_dbg(gt, "hardware MCR steering semaphore timed out");
>   }
>   
>   /**

Are we sure this does not warrant a level higher than dbg, such as 
notice/warn? Because how can we be sure the two entities will not stomp 
on each other toes if we failed to obtain lock? (How can we be sure 
about "forgot to unlock" vs "in prolonged active use"? Or if we can be 
sure, can we force unlock and take the lock for the driver explicitly?)

Regards,

Tvrtko


More information about the Intel-gfx mailing list