[Intel-gfx] [RFC PATCH] drm/i915/gt: Do not treat MCR locking timeouts as errors
Andi Shyti
andi.shyti at linux.intel.com
Wed Oct 4 14:08:18 UTC 2023
Hi Tvrtko,
> > The MCR steering semaphore is a shared lock entry between i915
> > and various firmware components.
> >
> > Getting the lock might sinchronize on some shared resources.
> > Sometimes though, it might happen that the firmware forgets to
> > unlock causing unnecessary noise in the driver which keeps doing
> > what was supposed to do, ignoring the problem.
> >
> > Do not consider this failure as an error, but just print a debug
> > message stating that the MCR locking has been skipped.
> >
> > On the driver side we still have spinlocks that make sure that
> > the access to the resources is serialized.
> >
> > Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Cc: Matt Roper <matthew.d.roper at intel.com>
> > Cc: Nirmoy Das <nirmoy.das at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > index 326c2ed1d99b..51eb693df39b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > @@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
> > * would indicate some hardware/firmware is misbehaving and not
> > * releasing it properly.
> > */
> > - if (err == -ETIMEDOUT) {
> > - gt_err_ratelimited(gt, "hardware MCR steering semaphore timed out");
> > - add_taint_for_CI(gt->i915, TAINT_WARN); /* CI is now unreliable */
> > - }
> > + if (err == -ETIMEDOUT)
> > + gt_dbg(gt, "hardware MCR steering semaphore timed out");
> > }
> > /**
>
> Are we sure this does not warrant a level higher than dbg, such as
> notice/warn?
We might make it a warn, but this doesn't change much the economy
of the driver as we will keep doing what we were supposed to do.
> Because how can we be sure the two entities will not stomp on
> each other toes if we failed to obtain lock?
So far, in all the research I've done, no one looks like using
MCR lock, but yet someone is stuck in it.
> (How can we be sure about
> "forgot to unlock" vs "in prolonged active use"?
There is a patch from Jonathan that is testing a different
timeout.
> Or if we can be sure, can
> we force unlock and take the lock for the driver explicitly?)
I sent a patch with this solution and Matt turned it down.
Andi
More information about the Intel-gfx
mailing list