[Intel-gfx] [PATCH v2] drm/i915/display: Reset message bus after each read/write operation

Mon Oct 16 11:36:34 UTC 2023

> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Sent: Friday, October 13, 2023 11:22 PM
> To: Kahola, Mika <mika.kahola at intel.com>
> Cc: intel-gfx at lists.freedesktop.org; Sousa, Gustavo <gustavo.sousa at intel.com>
> Subject: Re: [PATCH v2] drm/i915/display: Reset message bus after each read/write operation
> 
> On Fri, Oct 13, 2023 at 09:55:32AM +0300, Mika Kahola wrote:
> > Every know and then we receive the following error when running for
> > example IGT test kms_flip.
> >
> > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >
> > Since the error is sporadic in nature, the patch proposes to reset the
> > message bus after every successful or unsuccessful read or write
> > operation. However, the testing revealed that this alone is not
> > sufficient method and therefore an additional delay is introduced
> > anything from 200us to 300us to let HW to settle down. This delay is
> > experimental value and has no specification to back it up.
> >
> > v2: Add FIXME's to indicate the experimental nature of
> >     this workaround (Rodrigo)
> >
> > Signed-off-by: Mika Kahola <mika.kahola at intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index 6e6a1818071e..7c48ec5e54bd 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -221,6 +221,14 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port,
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_read_once(i915, port, lane, addr);
> >
> > +		/*
> > +		 * FIXME: Workaround to let HW to settle
> > +		 * down and let the message bus to end up
> > +		 * in a known state
> > +		 */
> > +		intel_cx0_bus_reset(i915, port, lane);
> > +		usleep_range(200, 300);
> > +
> >  		if (status >= 0)
> >  			return status;
> >  	}
> > @@ -300,6 +308,14 @@ static void __intel_cx0_write(struct drm_i915_private *i915, enum port port,
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_write_once(i915, port, lane, addr, data,
> > committed);
> >
> > +		/*
> > +		 * FIXME: Workaround to let HW to settle
> > +		 * down and let the message bus to end up
> > +		 * in a known state
> > +		 */
> > +		intel_cx0_bus_reset(i915, port, lane);
> > +		usleep_range(200, 300);
> 
> what cases trigger these paths?
I have noticed this when executing IGT test kms_flip with 4k monitors and eDP connected. Specially with 2x- cases.

> and how many calls in the modset case?
I haven't put any counters for this but quite a few anyways. This surely introduces additional delay.

> what about suspend/resume cylces?
> 
> if we do a single rmw we are introducing at least 400us of delay here.
> have we measured the overall final impact of these extra sleeps on the resume and modeset latencies?
We haven't measured overall impact. I did some further testing and 200-300us delay is an overkill solution. I tested with 1-10us delay and with my test vehicle, I didn't see any issue to use that. 

In fact, I moved the bus reset routine to be part of *_read_once() and *_write_once() functions and to me despite it looks more cleaner solution I can get rid of the delay. It must be noted that my test vehicle has changed to different MTL. Anyway, I will float the patch with revised function placement and delay drop. We can continue discussion from there.

Thanks!

-Mika-  

> 
> > +
> >  		if (status == 0)
> >  			return;
> >  	}
> > --
> > 2.34.1
> >