[Intel-gfx] [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus

Fri Nov 3 14:47:08 UTC 2023

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On Behalf Of Kahola, Mika
> Sent: Thursday, November 2, 2023 4:54 PM
> To: Sousa, Gustavo <gustavo.sousa at intel.com>; intel-gfx at lists.freedesktop.org
> Cc: Nikula, Jani <jani.nikula at intel.com>; Syrjala, Ville <ville.syrjala at intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> > -----Original Message-----
> > From: Sousa, Gustavo <gustavo.sousa at intel.com>
> > Sent: Thursday, November 2, 2023 4:23 PM
> > To: Kahola, Mika <mika.kahola at intel.com>;
> > intel-gfx at lists.freedesktop.org
> > Cc: Kahola, Mika <mika.kahola at intel.com>; Nikula, Jani
> > <jani.nikula at intel.com>; Syrjala, Ville <ville.syrjala at intel.com>
> > Subject: Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA
> > message bus
> >
> > Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
> > >It is possible that sticky bits or error bits are left on message bus
> > >status register. Reading and then writing the value back to
> > >messagebus status register clears all possible sticky bits and errors.
> > >
> > >Signed-off-by: Mika Kahola <mika.kahola at intel.com>
> > >---
> > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> > > 1 file changed, 14 insertions(+)
> > >
> > >diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >index b2ad4c6172f6..f439f0c7b400 100644
> > >--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > >@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> > >                 return -ETIMEDOUT;
> > >         }
> > >
> > >+        /*
> > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > >+         * any error sticky bits set from previous transactions
> > >+         */
> > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > >+ lane), val);
> > >+
> > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > >                        XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6
> > >+269,13 @@ static int __intel_cx0_write_once(struct drm_i915_private
> > >+*i915, enum port port,
> > >                 return -ETIMEDOUT;
> > >         }
> > >
> > >+        /*
> > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > >+         * any error sticky bits set from previous transactions
> > >+         */
> > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > >+ lane), val);
> > >+
> >
> > Looking at the current state of the code, looks like to me that we already clear the bits from both the "success" and "failure"
> paths.
> > For the "success"
> > paths, that is done by a direct call to
> > intel_clear_response_ready_flag(); for the "failure" case, the call to
> > intel_clear_response_ready_flag() is done as part of intel_cx0_bus_reset().
> >
> > Thus, considering that we start using the msgbus from a clean state,
> > maybe these extra steps are not necessary? Have you tried adding a
> > call to
> > intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?
> That I haven't try to reset bus at the stage. I can give it a go and check what happens. To me it seems, that we are sometimes
> stuck at waiting the ack and eventually we time out and bail out. I have no clue why this happens from time to time. We already
> reset the bus after every read/write operation. In addition, a small delay seem to help and clear the sporadic read/write failures
> to the bus. However, this is more like a workaround and I'm not really happy about this sort of hack.
> 
> I will give a go for your suggestion and come back once I have the results.

I ran a test with intel_cx0_bus_reset() when placed in intel_cx0_phy_transaction_begin(). This didn't help either as with kms_flip IGT test I was able to trigger this read failure

[drm] *ERROR* PHY G Read 0d80 failed after 3 retries.

This was with the configuration where the test vehicle had 4K and eDP monitors connected.

> 
> Thanks!
> -Mika-
> 
> >
> > Also, I think it would be good if we understood better were those uncleared bits are coming from...
> >
> > --
> > Gustavo Sousa
> >
> > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > >                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> > >--
> > >2.34.1
> > >