[Intel-gfx] [PATCH] drm/i915/cx0: Check and increase msgbus timeout threshold

Kahola, Mika mika.kahola at intel.com
Thu Aug 31 10:23:06 UTC 2023


> -----Original Message-----
> From: Sousa, Gustavo <gustavo.sousa at intel.com>
> Sent: Wednesday, August 30, 2023 3:19 PM
> To: Kahola, Mika <mika.kahola at intel.com>; Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; intel-
> gfx at lists.freedesktop.org
> Subject: RE: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase msgbus timeout threshold
> 
> Quoting Gustavo Sousa (2023-08-29 08:45:41-03:00)
> >Quoting Kahola, Mika (2023-08-29 06:35:17-03:00)
> >>> -----Original Message-----
> >>> From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On Behalf
> >>> Of Sripada, Radhakrishna
> >>> Sent: Tuesday, August 29, 2023 1:54 AM
> >>> To: Sousa, Gustavo <gustavo.sousa at intel.com>;
> >>> intel-gfx at lists.freedesktop.org
> >>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase
> >>> msgbus timeout threshold
> >>>
> >>> Hi Gustavo,
> >>>
> >>> > -----Original Message-----
> >>> > From: Sousa, Gustavo <gustavo.sousa at intel.com>
> >>> > Sent: Monday, August 28, 2023 2:46 PM
> >>> > To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; intel-
> >>> > gfx at lists.freedesktop.org
> >>> > Subject: RE: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase
> >>> > msgbus timeout threshold
> >>> >
> >>> > Quoting Sripada, Radhakrishna (2023-08-28 17:30:19-03:00)
> >>> > >Hi Gustavo,
> >>> >
> >>> > Hi, RK.
> >>> >
> >>> > Thanks for the feedback! Please, see my replies below.
> >>> >
> >>> > >
> >>> > >> -----Original Message-----
> >>> > >> From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On
> >>> > >> Behalf Of
> >>> > Gustavo
> >>> > >> Sousa
> >>> > >> Sent: Monday, August 28, 2023 10:34 AM
> >>> > >> To: intel-gfx at lists.freedesktop.org
> >>> > >> Subject: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase
> >>> > >> msgbus
> >>> > timeout
> >>> > >> threshold
> >>> > >>
> >>> > >> We have experienced timeout issues when accessing C20 SRAM registers.
> >>> > >This wording is misleading. It seems to imply that we directly
> >>> > >access SRAM registers via msg bus but the reads/writes go through
> >>> > >intermediate registers and it is this read/write that is failing. Adding extra clarity here would be helpful.
> >>> >
> >>> > Hm... Okay. I thought that would already be implied to someone
> >>> > familiar with the code. What about:
> >>> >
> >>> >     s/when accessing/when going through the sequence to access/
> >>> This is better to indicate that it is not direct access.
> >>>
> >>> >
> >>> > ?
> >>> >
> >>> > >
> >>> > >> Experimentation showed that bumping the message bus timer
> >>> > >> threshold helped on getting display Type-C connection on the C20 PHY to work.
> >>> > >>
> >>> > >> While the timeout is still under investigation with the HW
> >>> > >> team, having logic to allow forward progress (with the proper warnings) seems useful.
> >>> > >> Thus, let's bump the threshold when a timeout is detected.
> >>> > >>
> >>> > >> The maximum value of 0x200 pclk cycles was somewhat arbitrary -
> >>> > >> 2x the default value. That value was successfully tested on
> >>> > >> real hardware that was displaying timeouts otherwise.
> >>> > >>
> >>> > >> BSpec: 65156
> >>> > >> Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
> >>> > >> ---
> >>> > >>  drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 41
> >>> > >> +++++++++++++++++++
> >>> > >> +++++++++++++++++++ .../gpu/drm/i915/display/intel_cx0_phy_regs
> >>> > >> +++++++++++++++++++ .h
> >>> > >> | 12 ++++++
> >>> > >>  2 files changed, 53 insertions(+)
> >>> > >>
> >>> > >> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >>> > >> b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >>> > >> index dd489b50ad60..55d2333032e3 100644
> >>> > >> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >>> > >> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> >>> > >> @@ -5,6 +5,7 @@
> >>> > >>
> >>> > >>  #include <linux/log2.h>
> >>> > >>  #include <linux/math64.h>
> >>> > >> +#include <linux/minmax.h>
> >>> > >>  #include "i915_reg.h"
> >>> > >>  #include "intel_cx0_phy.h"
> >>> > >>  #include "intel_cx0_phy_regs.h"
> >>> > >> @@ -29,6 +30,8 @@
> >>> > >>  #define INTEL_CX0_LANE1                BIT(1)
> >>> > >>  #define INTEL_CX0_BOTH_LANES        (INTEL_CX0_LANE1 |
> >>> > >> INTEL_CX0_LANE0)
> >>> > >>
> >>> > >> +#define INTEL_CX0_MSGBUS_TIMER_VAL_MAX        0x200
> >>> > >> +
> >>> > >>  bool intel_is_c10phy(struct drm_i915_private *i915, enum phy
> >>> > >> phy) {
> >>> > >>          if (DISPLAY_VER_FULL(i915) == IP_VER(14, 0) && phy <
> >>> > >> PHY_C) @@ -119,6 +122,43 @@ static void
> >>> > >> intel_cx0_bus_reset(struct
> >>> > drm_i915_private
> >>> > >> *i915, enum port port, i
> >>> > >>          intel_clear_response_ready_flag(i915, port, lane);  }
> >>> > >>
> >>> > >> +/*
> >>> > >> + * Check if there was a timeout detected by the hardware in
> >>> > >> +the message bus
> >>> > >> + * and bump the threshold if so.
> >>
> >>Just a thought but wouldn't it be simpler if we just set the timeout to it's maximum and discard the check here?
> >
> >I'm not sure which exactly check you are talking about here:
> >
> >  1) The check on the XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT.
> >
> >     I think this could give us useful debugging info, for example, if our code
> >     thinks the message bus timed out while the bus hardware did not say so. I
> >     would prefer to keep this one, if you are okay with it.
> >
> >  2) The check on the current value of the threshold and the exponential
> >     increase up to the maximum.
> >
> >     I would agree that I did a bit of over-engineering here. I guess I could
> >     simply to a rmw using INTEL_CX0_MSGBUS_TIMER_VAL_MAX directly as you
> >     suggested; and maybe rename INTEL_CX0_MSGBUS_TIMER_VAL_MAX to
> >     INTEL_CX0_MSGBUS_TIMER_BUMPED_VAL, just not to give the impression that
> >     0x200 is the maximum accepted by the hardware.
> >
> >What do you think?
> 
> I've sent a v2 addressing (2).

Yes, you're right, we would benefit from this timeout check. I think your approach to directly bump up timeout to it's max is a way to go.

I will have a look at the v2 patch.

-Mika-
> 
> --
> Gustavo Sousa
> 
> >
> >--
> >Gustavo Sousa
> >
> >>
> >>-Mika-
> >>
> >>> > >> + */
> >>> > >> +static void intel_cx0_bus_check_and_bump_timer(struct
> >>> > >> +drm_i915_private
> >>> > >> *i915,
> >>> > >> +                                               enum port port,
> >>> > >> +int
> >>> > >> +lane) {
> >>> > >> +        enum phy phy = intel_port_to_phy(i915, port);
> >>> > >> +        i915_reg_t reg;
> >>> > >> +        u32 val;
> >>> > >> +        u32 timer_val;
> >>> > >> +
> >>> > >> +        reg = XELPDP_PORT_MSGBUS_TIMER(port, lane);
> >>> > >> +        val = intel_de_read(i915, reg);
> >>> > >> +
> >>> > >> +        if (!(val & XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT)) {
> >>> > >> +                drm_warn(&i915->drm,
> >>> > >> +                         "PHY %c lane %d: hardware did not
> >>> > >> + detect a
> >>> > >> timeout\n",
> >>> > >> +                         phy_name(phy), lane);
> >>> > >> +                return;
> >>> > >> +        }
> >>> > >> +
> >>> > >> +        timer_val =
> >>> > >> REG_FIELD_GET(XELPDP_PORT_MSGBUS_TIMER_VAL_MASK, val);
> >>> > >> +
> >>> > >> +        if (timer_val == INTEL_CX0_MSGBUS_TIMER_VAL_MAX)
> >>> > >> +                return;
> >>> > >> +
> >>> > >> +        timer_val = min(2 * timer_val,
> >>> > >> (u32)INTEL_CX0_MSGBUS_TIMER_VAL_MAX);
> >>> > >     ^ is this cast necessary?
> >>> >
> >>> > Yes. I remember getting a warning without it. Let me remove it to remember...
> >>> Got it thanks for the quick check.
> >>>
> >>> >
> >>> > ...done! I think it complains because the arguments are of different types:
> >>> >
> >>> >     In file included from drivers/gpu/drm/i915/display/intel_cx0_phy.c:8:
> >>> >     drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function
> >>> > ‘intel_cx0_bus_check_and_bump_timer’:
> >>> >     ./include/linux/minmax.h:20:35: error: comparison of distinct
> >>> > pointer types lacks a cast [-Werror]
> >>> >        20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
> >>> >           |                                   ^~
> >>> >     ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
> >>> >        26 |                 (__typecheck(x, y) && __no_side_effects(x, y))
> >>> >           |                  ^~~~~~~~~~~
> >>> >     ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
> >>> >        36 |         __builtin_choose_expr(__safe_cmp(x, y), \
> >>> >           |                               ^~~~~~~~~~
> >>> >     ./include/linux/minmax.h:67:25: note: in expansion of macro ‘__careful_cmp’
> >>> >        67 | #define min(x, y)       __careful_cmp(x, y, <)
> >>> >           |                         ^~~~~~~~~~~~~
> >>> >     drivers/gpu/drm/i915/display/intel_cx0_phy.c:152:21: note: in
> >>> > expansion of macro ‘min’
> >>> >       152 |         timer_val = min(2 * timer_val,
> >>> > INTEL_CX0_MSGBUS_TIMER_VAL_MAX);
> >>> >           |
> >>> >
> >>> > >
> >>> > >> +        val &= ~XELPDP_PORT_MSGBUS_TIMER_VAL_MASK;
> >>> > >> +        val |=
> >>> > >> + REG_FIELD_PREP(XELPDP_PORT_MSGBUS_TIMER_VAL_MASK,
> >>> > >> timer_val);
> >>> > >We do not use REG_FIELD_PREP in the function. Instead we use it
> >>> > >in register definition in intel_cx0_phy_regs.h.
> >>> >
> >>> > I think it makes sense have definitions using REG_FIELD_PREP() in
> >>> > header files and use those definitions in .c files for fields
> >>> > whose values are are enumerated.
> >>> >
> >>> > Now, for fields without enumeration, like this one in discussion,
> >>> > I think it is fine to use REG_FIELD_PREP() directly. The MASK is already exposed anyway...
> >>> >
> >>> > Besides, it seems we already have lines of code in *.c files using
> >>> > the pattern
> >>> > above:
> >>> >
> >>> >     git grep REG_FIELD_PREP drm-tip/drm-tip -- ':/drivers/gpu/drm/i915/**/*.c'
> >>> >
> >>> > Let me know what you think. I could add a
> >>> > XELPDP_PORT_MSGBUS_TIMER_VAL() macro if you think it is necessary.
> >>> > My personal take is that using REG_FIELD_PREP() for this case is fine.
> >>> Let us conform with the norms for the macro-fields used in this file instead of starting a new pattern.
> >>>
> >>> --Radhakrishna(RK) Sripada
> >>> >
> >>> > --
> >>> > Gustavo Sousa
> >>> >
> >>> > >
> >>> > >Thanks,
> >>> > >Radhakrishna Sripada
> >>> > >
> >>> > >> +
> >>> > >> +        drm_dbg_kms(&i915->drm,
> >>> > >> +                    "PHY %c lane %d: increasing msgbus timer
> >>> > >> + threshold to
> >>> > >> %#x\n",
> >>> > >> +                    phy_name(phy), lane, timer_val);
> >>> > >> +        intel_de_write(i915, reg, val); }
> >>> > >> +
> >>> > >>  static int intel_cx0_wait_for_ack(struct drm_i915_private
> >>> > >> *i915, enum port
> >>> > port,
> >>> > >>                                    int command, int lane, u32
> >>> > >> *val) { @@ -132,6 +172,7 @@ static int
> >>> > >> intel_cx0_wait_for_ack(struct drm_i915_private *i915, enum port
> >>> > >> port,
> >>> > >>
> >>> > >> XELPDP_MSGBUS_TIMEOUT_SLOW,
> >>> > >> val)) {
> >>> > >>                  drm_dbg_kms(&i915->drm, "PHY %c Timeout
> >>> > >> waiting for message ACK. Status: 0x%x\n",
> >>> > >>                              phy_name(phy), *val);
> >>> > >> +                intel_cx0_bus_check_and_bump_timer(i915, port,
> >>> > >> + lane);
> >>> > >>                  intel_cx0_bus_reset(i915, port, lane);
> >>> > >>                  return -ETIMEDOUT;
> >>> > >>          }
> >>> > >> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
> >>> > >> b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
> >>> > >> index cb5d1be2ba19..17cc3385aabb 100644
> >>> > >> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
> >>> > >> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
> >>> > >> @@ -110,6 +110,18 @@
> >>> > >>  #define   CX0_P4PG_STATE_DISABLE                        0xC
> >>> > >>  #define   CX0_P2_STATE_RESET                                0x2
> >>> > >>
> >>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_A
> >>> > >>         0x640d8
> >>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_B
> >>> > >>         0x641d8
> >>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC1                0x16f258
> >>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC2                0x16f458
> >>> > >> +#define XELPDP_PORT_MSGBUS_TIMER(port, lane)
> >>> > >>         _MMIO(_PICK_EVEN_2RANGES(port, PORT_TC1, \
> >>> > >> +
> >>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_A, \
> >>> > >> +
> >>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_B, \
> >>> > >> +
> >>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC1, \
> >>> > >> +
> >>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC2) + (lane) * 4)
> >>> > >> +#define   XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT                REG_BIT(31)
> >>> > >> +#define   XELPDP_PORT_MSGBUS_TIMER_VAL_MASK
> >>> > >>         REG_GENMASK(23, 0)
> >>> > >> +
> >>> > >>  #define _XELPDP_PORT_CLOCK_CTL_A                        0x640E0
> >>> > >>  #define _XELPDP_PORT_CLOCK_CTL_B                        0x641E0
> >>> > >>  #define _XELPDP_PORT_CLOCK_CTL_USBC1                        0x16F260
> >>> > >> --
> >>> > >> 2.41.0
> >>> > >


More information about the Intel-gfx mailing list