[Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation

Thu Oct 5 07:05:31 UTC 2023

> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Sent: Wednesday, October 4, 2023 3:56 PM
> To: Kahola, Mika <mika.kahola at intel.com>
> Cc: intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/display: Reset message bus after each read/write operation
> 
> On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote:
> > Every know and then we receive the following error when running for
> > example IGT test kms_flip.
> >
> > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries.
> >
> > Since the error is sporadic in nature, the patch proposes to reset the
> > message bus after every successful or unsuccessful read or write
> > operation. However, testing revealed that this alone is not sufficient
> > method an additiona delay is also introduces anything from 200us to
> > 300us. This delay is experimental value and has no specification to
> > back it up.
> 
> have you tried the delays without the bus_reset?
Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per 
BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces
the frequency of this error but doesn't solve this issue.

> have you talked to hw architects about this?
Yes, HW guys requested traces which I provided but based on these the sequence we use in i915
is correct.

> 
> I wonder if we should add the delay inside the bus_reset itself?
> although the bit 15 clear check should be enough by itself and it doesn't look like it is a hw/fw reset involved to justify the extra
> delay.
That should be enough. To me, it looks like when reading/writing to the bus maybe too fast, the hw cannot handle that and we need
to reset and let things settle down before trying again.

> 
> well, at least some /* FIXME: */ or /* XXX: */ comments is desired along with the messages if we are going with this hack without
> understanding why...
True, I will add these the the patch.

Thanks for review!

-Mika-
> 
> >
> > Signed-off-by: Mika Kahola <mika.kahola at intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > index abd607b564f1..a71b8a29d6b0 100644
> > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port,
> >  	/* 3 tries is assumed to be enough to read successfully */
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_read_once(i915, port, lane, addr);
> > +		intel_cx0_bus_reset(i915, port, lane);
> >
> >  		if (status >= 0)
> >  			return status;
> > +
> > +		usleep_range(200, 300);
> >  	}
> >
> >  	drm_err_once(&i915->drm, "PHY %c Read %04x failed after %d
> > retries.\n", @@ -299,9 +302,12 @@ static void __intel_cx0_write(struct drm_i915_private *i915, enum port port,
> >  	/* 3 tries is assumed to be enough to write successfully */
> >  	for (i = 0; i < 3; i++) {
> >  		status = __intel_cx0_write_once(i915, port, lane, addr, data,
> > committed);
> > +		intel_cx0_bus_reset(i915, port, lane);
> >
> >  		if (status == 0)
> >  			return;
> > +
> > +		usleep_range(200, 300);
> >  	}
> >
> >  	drm_err_once(&i915->drm,
> > --
> > 2.34.1
> >