[Intel-gfx] [PATCH] drm/i915/cx0: Check and increase msgbus timeout threshold

Gustavo Sousa gustavo.sousa at intel.com
Tue Aug 29 11:45:41 UTC 2023


Quoting Kahola, Mika (2023-08-29 06:35:17-03:00)
>> -----Original Message-----
>> From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On Behalf Of Sripada, Radhakrishna
>> Sent: Tuesday, August 29, 2023 1:54 AM
>> To: Sousa, Gustavo <gustavo.sousa at intel.com>; intel-gfx at lists.freedesktop.org
>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase msgbus timeout threshold
>> 
>> Hi Gustavo,
>> 
>> > -----Original Message-----
>> > From: Sousa, Gustavo <gustavo.sousa at intel.com>
>> > Sent: Monday, August 28, 2023 2:46 PM
>> > To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; intel-
>> > gfx at lists.freedesktop.org
>> > Subject: RE: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase
>> > msgbus timeout threshold
>> >
>> > Quoting Sripada, Radhakrishna (2023-08-28 17:30:19-03:00)
>> > >Hi Gustavo,
>> >
>> > Hi, RK.
>> >
>> > Thanks for the feedback! Please, see my replies below.
>> >
>> > >
>> > >> -----Original Message-----
>> > >> From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On Behalf
>> > >> Of
>> > Gustavo
>> > >> Sousa
>> > >> Sent: Monday, August 28, 2023 10:34 AM
>> > >> To: intel-gfx at lists.freedesktop.org
>> > >> Subject: [Intel-gfx] [PATCH] drm/i915/cx0: Check and increase
>> > >> msgbus
>> > timeout
>> > >> threshold
>> > >>
>> > >> We have experienced timeout issues when accessing C20 SRAM registers.
>> > >This wording is misleading. It seems to imply that we directly access
>> > >SRAM registers via msg bus but the reads/writes go through
>> > >intermediate registers and it is this read/write that is failing. Adding extra clarity here would be helpful.
>> >
>> > Hm... Okay. I thought that would already be implied to someone
>> > familiar with the code. What about:
>> >
>> >     s/when accessing/when going through the sequence to access/
>> This is better to indicate that it is not direct access.
>> 
>> >
>> > ?
>> >
>> > >
>> > >> Experimentation showed that bumping the message bus timer threshold
>> > >> helped on getting display Type-C connection on the C20 PHY to work.
>> > >>
>> > >> While the timeout is still under investigation with the HW team,
>> > >> having logic to allow forward progress (with the proper warnings) seems useful.
>> > >> Thus, let's bump the threshold when a timeout is detected.
>> > >>
>> > >> The maximum value of 0x200 pclk cycles was somewhat arbitrary - 2x
>> > >> the default value. That value was successfully tested on real
>> > >> hardware that was displaying timeouts otherwise.
>> > >>
>> > >> BSpec: 65156
>> > >> Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>> > >> ---
>> > >>  drivers/gpu/drm/i915/display/intel_cx0_phy.c  | 41
>> > >> +++++++++++++++++++  .../gpu/drm/i915/display/intel_cx0_phy_regs.h
>> > >> | 12 ++++++
>> > >>  2 files changed, 53 insertions(+)
>> > >>
>> > >> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > >> b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > >> index dd489b50ad60..55d2333032e3 100644
>> > >> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > >> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
>> > >> @@ -5,6 +5,7 @@
>> > >>
>> > >>  #include <linux/log2.h>
>> > >>  #include <linux/math64.h>
>> > >> +#include <linux/minmax.h>
>> > >>  #include "i915_reg.h"
>> > >>  #include "intel_cx0_phy.h"
>> > >>  #include "intel_cx0_phy_regs.h"
>> > >> @@ -29,6 +30,8 @@
>> > >>  #define INTEL_CX0_LANE1                BIT(1)
>> > >>  #define INTEL_CX0_BOTH_LANES        (INTEL_CX0_LANE1 |
>> > >> INTEL_CX0_LANE0)
>> > >>
>> > >> +#define INTEL_CX0_MSGBUS_TIMER_VAL_MAX        0x200
>> > >> +
>> > >>  bool intel_is_c10phy(struct drm_i915_private *i915, enum phy phy)
>> > >> {
>> > >>          if (DISPLAY_VER_FULL(i915) == IP_VER(14, 0) && phy <
>> > >> PHY_C) @@ -119,6 +122,43 @@ static void intel_cx0_bus_reset(struct
>> > drm_i915_private
>> > >> *i915, enum port port, i
>> > >>          intel_clear_response_ready_flag(i915, port, lane);  }
>> > >>
>> > >> +/*
>> > >> + * Check if there was a timeout detected by the hardware in the
>> > >> +message bus
>> > >> + * and bump the threshold if so.
>
>Just a thought but wouldn't it be simpler if we just set the timeout to it's maximum and discard the check here?

I'm not sure which exactly check you are talking about here:

  1) The check on the XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT.

     I think this could give us useful debugging info, for example, if our code
     thinks the message bus timed out while the bus hardware did not say so. I
     would prefer to keep this one, if you are okay with it.

  2) The check on the current value of the threshold and the exponential
     increase up to the maximum.

     I would agree that I did a bit of over-engineering here. I guess I could
     simply to a rmw using INTEL_CX0_MSGBUS_TIMER_VAL_MAX directly as you
     suggested; and maybe rename INTEL_CX0_MSGBUS_TIMER_VAL_MAX to
     INTEL_CX0_MSGBUS_TIMER_BUMPED_VAL, just not to give the impression that
     0x200 is the maximum accepted by the hardware.

What do you think?

--
Gustavo Sousa

>
>-Mika- 
>
>> > >> + */
>> > >> +static void intel_cx0_bus_check_and_bump_timer(struct
>> > >> +drm_i915_private
>> > >> *i915,
>> > >> +                                               enum port port, int
>> > >> +lane) {
>> > >> +        enum phy phy = intel_port_to_phy(i915, port);
>> > >> +        i915_reg_t reg;
>> > >> +        u32 val;
>> > >> +        u32 timer_val;
>> > >> +
>> > >> +        reg = XELPDP_PORT_MSGBUS_TIMER(port, lane);
>> > >> +        val = intel_de_read(i915, reg);
>> > >> +
>> > >> +        if (!(val & XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT)) {
>> > >> +                drm_warn(&i915->drm,
>> > >> +                         "PHY %c lane %d: hardware did not detect
>> > >> + a
>> > >> timeout\n",
>> > >> +                         phy_name(phy), lane);
>> > >> +                return;
>> > >> +        }
>> > >> +
>> > >> +        timer_val =
>> > >> REG_FIELD_GET(XELPDP_PORT_MSGBUS_TIMER_VAL_MASK, val);
>> > >> +
>> > >> +        if (timer_val == INTEL_CX0_MSGBUS_TIMER_VAL_MAX)
>> > >> +                return;
>> > >> +
>> > >> +        timer_val = min(2 * timer_val,
>> > >> (u32)INTEL_CX0_MSGBUS_TIMER_VAL_MAX);
>> > >     ^ is this cast necessary?
>> >
>> > Yes. I remember getting a warning without it. Let me remove it to remember...
>> Got it thanks for the quick check.
>> 
>> >
>> > ...done! I think it complains because the arguments are of different types:
>> >
>> >     In file included from drivers/gpu/drm/i915/display/intel_cx0_phy.c:8:
>> >     drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function
>> > ‘intel_cx0_bus_check_and_bump_timer’:
>> >     ./include/linux/minmax.h:20:35: error: comparison of distinct
>> > pointer types lacks a cast [-Werror]
>> >        20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
>> >           |                                   ^~
>> >     ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
>> >        26 |                 (__typecheck(x, y) && __no_side_effects(x, y))
>> >           |                  ^~~~~~~~~~~
>> >     ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
>> >        36 |         __builtin_choose_expr(__safe_cmp(x, y), \
>> >           |                               ^~~~~~~~~~
>> >     ./include/linux/minmax.h:67:25: note: in expansion of macro ‘__careful_cmp’
>> >        67 | #define min(x, y)       __careful_cmp(x, y, <)
>> >           |                         ^~~~~~~~~~~~~
>> >     drivers/gpu/drm/i915/display/intel_cx0_phy.c:152:21: note: in
>> > expansion of macro ‘min’
>> >       152 |         timer_val = min(2 * timer_val,
>> > INTEL_CX0_MSGBUS_TIMER_VAL_MAX);
>> >           |
>> >
>> > >
>> > >> +        val &= ~XELPDP_PORT_MSGBUS_TIMER_VAL_MASK;
>> > >> +        val |= REG_FIELD_PREP(XELPDP_PORT_MSGBUS_TIMER_VAL_MASK,
>> > >> timer_val);
>> > >We do not use REG_FIELD_PREP in the function. Instead we use it in
>> > >register definition in intel_cx0_phy_regs.h.
>> >
>> > I think it makes sense have definitions using REG_FIELD_PREP() in
>> > header files and use those definitions in .c files for fields whose
>> > values are are enumerated.
>> >
>> > Now, for fields without enumeration, like this one in discussion, I
>> > think it is fine to use REG_FIELD_PREP() directly. The MASK is already exposed anyway...
>> >
>> > Besides, it seems we already have lines of code in *.c files using the
>> > pattern
>> > above:
>> >
>> >     git grep REG_FIELD_PREP drm-tip/drm-tip -- ':/drivers/gpu/drm/i915/**/*.c'
>> >
>> > Let me know what you think. I could add a
>> > XELPDP_PORT_MSGBUS_TIMER_VAL() macro if you think it is necessary. My
>> > personal take is that using REG_FIELD_PREP() for this case is fine.
>> Let us conform with the norms for the macro-fields used in this file instead of starting a new pattern.
>> 
>> --Radhakrishna(RK) Sripada
>> >
>> > --
>> > Gustavo Sousa
>> >
>> > >
>> > >Thanks,
>> > >Radhakrishna Sripada
>> > >
>> > >> +
>> > >> +        drm_dbg_kms(&i915->drm,
>> > >> +                    "PHY %c lane %d: increasing msgbus timer
>> > >> + threshold to
>> > >> %#x\n",
>> > >> +                    phy_name(phy), lane, timer_val);
>> > >> +        intel_de_write(i915, reg, val); }
>> > >> +
>> > >>  static int intel_cx0_wait_for_ack(struct drm_i915_private *i915,
>> > >> enum port
>> > port,
>> > >>                                    int command, int lane, u32 *val)
>> > >> { @@ -132,6 +172,7 @@ static int intel_cx0_wait_for_ack(struct
>> > >> drm_i915_private *i915, enum port port,
>> > >>
>> > >> XELPDP_MSGBUS_TIMEOUT_SLOW,
>> > >> val)) {
>> > >>                  drm_dbg_kms(&i915->drm, "PHY %c Timeout waiting
>> > >> for message ACK. Status: 0x%x\n",
>> > >>                              phy_name(phy), *val);
>> > >> +                intel_cx0_bus_check_and_bump_timer(i915, port,
>> > >> + lane);
>> > >>                  intel_cx0_bus_reset(i915, port, lane);
>> > >>                  return -ETIMEDOUT;
>> > >>          }
>> > >> diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
>> > >> b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
>> > >> index cb5d1be2ba19..17cc3385aabb 100644
>> > >> --- a/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
>> > >> +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
>> > >> @@ -110,6 +110,18 @@
>> > >>  #define   CX0_P4PG_STATE_DISABLE                        0xC
>> > >>  #define   CX0_P2_STATE_RESET                                0x2
>> > >>
>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_A
>> > >>         0x640d8
>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_B
>> > >>         0x641d8
>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC1                0x16f258
>> > >> +#define _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC2                0x16f458
>> > >> +#define XELPDP_PORT_MSGBUS_TIMER(port, lane)
>> > >>         _MMIO(_PICK_EVEN_2RANGES(port, PORT_TC1, \
>> > >> +
>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_A, \
>> > >> +
>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_B, \
>> > >> +
>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC1, \
>> > >> +
>> > >>          _XELPDP_PORT_MSGBUS_TIMER_LN0_USBC2) + (lane) * 4)
>> > >> +#define   XELPDP_PORT_MSGBUS_TIMER_TIMED_OUT                REG_BIT(31)
>> > >> +#define   XELPDP_PORT_MSGBUS_TIMER_VAL_MASK
>> > >>         REG_GENMASK(23, 0)
>> > >> +
>> > >>  #define _XELPDP_PORT_CLOCK_CTL_A                        0x640E0
>> > >>  #define _XELPDP_PORT_CLOCK_CTL_B                        0x641E0
>> > >>  #define _XELPDP_PORT_CLOCK_CTL_USBC1                        0x16F260
>> > >> --
>> > >> 2.41.0
>> > >


More information about the Intel-gfx mailing list