[Intel-gfx] [PATCH v4 3/8] drm/i915: Decode system memory bandwidth
Paulo Zanoni
paulo.r.zanoni at intel.com
Thu Nov 3 19:06:46 UTC 2016
Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
> This patch adds support to decode system memory bandwidth
> which will be used for arbitrated display memory percentage
> calculation in GEN9 based system.
>
> Changes from v1:
> - Address comments from Paulo
> - implement decode function for SKL/KBL also
Again, my understanding of these registers is not good so I will ask
some questions that will help me properly understand and review the
code. I may also end up asking some questions that don't make sense.
Please correct me in case any of my assumptions is wrong.
Perhaps answers to my questions could become code comments since I
suppose most of the gfx team is not super familiar with the memory
regs, and we're going to have to maintain the code.
>
> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 170
> ++++++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_drv.h | 14 ++++
> drivers/gpu/drm/i915/i915_reg.h | 38 +++++++++
> 3 files changed, 222 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> index 89d3222..b5f601c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct
> drm_i915_private *dev_priv)
> DRM_DEBUG_DRIVER("use GPU sempahores? %s\n",
> yesno(i915.semaphores));
> }
>
> +static inline void skl_memdev_info_fill(struct memdev_info *info,
> uint32_t val)
> +{
> + uint8_t channel_width, rank;
> +
> + info->rank_valid = true;
> + channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
> + SKL_DRAM_CHANNEL_WIDTH_MASK;
Why are we looking at the L bits instead of the S bits? What's the
difference between them?
> + if (channel_width == SKL_DRAM_WIDTH_1)
> + info->channel_width_bytes = 1;
> + else if (channel_width == SKL_DRAM_WIDTH_2)
> + info->channel_width_bytes = 2;
> + else
> + info->channel_width_bytes = 4;
This is not checking for the invalid value of 0x3. Perhaps a switch()
instead of an if() ladder would be better here.
> +
> + rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
> + if (rank == SKL_DRAM_RANK_SINGLE)
> + info->rank = DRAM_RANK_SINGLE;
> + else
> + info->rank = DRAM_RANK_DUAL;
> +}
> +
> +static int
> +skl_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> + uint32_t val = 0;
> + uint32_t mem_speed = 0;
> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
> +
> + val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
> + mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
> + SKL_MEMORY_FREQ_MULTIPLIER, 1000);
But but but isn't it possible to have more than 2GHz in SKL? I think I
have a big SKL machine with more than 2GHz here... I'll check it later.
Also, don't we need to check FREQ_TYPE and ODD_RATIO?
Some digging suggests that maybe we need to figure out somehow if we're
using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ
register instead of BIOS_DATA).
I'd really need some clarification on how all of this works.
Also, it looks like there's no need for this to be a 64bit calculation
due to the mask limit.
> +
> + if (mem_speed == 0)
> + return -EINVAL;
> +
> + memdev_info->valid = true;
> + memdev_info->mem_speed_khz = mem_speed;
> + memdev_info->num_channels = 0;
> +
> + memdev_info->rank_valid = false;
> + val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
> +
> + if (val != 0xFFFFFFFF) {
> + memdev_info->num_channels++;
> + skl_memdev_info_fill(memdev_info, val);
> + }
> +
> + val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
> +
> + if (val != 0xFFFFFFFF) {
> + memdev_info->num_channels++;
> + if (!memdev_info->rank_valid)
> + skl_memdev_info_fill(memdev_info, val);
> + }
A simple iteration over the two register addresses would have saved
some code :).
Also, isn't it possible to have channels with different width/rank?
Shouldn't our code be sanity-checking this? Perhaps grab the width/rank
for each channel and compare. When the specs are unclear or don't
exist, it's probably better if our code is able to check/validate its
own assumptions.
> +
> + if (memdev_info->num_channels == 0)
> + memdev_info->valid = false;
> + return 0;
> +}
> +
> +static int
> +bxt_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
> + uint32_t dram_channel;
> + uint32_t mem_speed, val;
> + uint8_t num_channel, dram_type;
> + int i;
> +
> + val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
> + mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
> + SKL_MEMORY_FREQ_MULTIPLIER, 1000);
AFAIU, there's no need for 64 bit division since the maximum value
is 8399790.
> +
> + if (mem_speed == 0)
> + return -EINVAL;
> +
> + memdev_info->valid = true;
> + memdev_info->mem_speed_khz = mem_speed;
> + dram_type = (val >> BXT_DRAM_TYPE_SHIFT) &
> BXT_DRAM_TYPE_MASK;
> + dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) &
> BXT_DRAM_CHANNEL_MASK;
> + num_channel = hweight32(dram_channel);
> +
> + /*
> + * The lpddr3 and lpddr4 technologies can have 1-4 channels
> and the
> + * channels are 32bits wide; while ddr3l technologies can
> have 1-2
> + * channels and the channels are 64 bits wide. In case of
> single 64 bit
> + * wide DDR3L dimm sets two channel-active bits in
> + * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L
> 64bit dimm
> + * will set all four channel-active bits in above register.
> + * In order to get actual number of channels in DDR3L DRAM
> we need to
> + * device total channel-active bits set by 2.
> + */
> +
> + switch (dram_type) {
> + case BXT_DRAM_TYPE_LPDDR3:
> + case BXT_DRAM_TYPE_LPDDR4:
> + memdev_info->channel_width_bytes = 4;
> + memdev_info->num_channels = num_channel;
> + break;
> + case BXT_DRAM_TYPE_DDR3L:
> + memdev_info->channel_width_bytes = 8;
> + memdev_info->num_channels = num_channel / 2;
> + break;
> + default:
> + DRM_DEBUG_KMS("Unknown DRAM type\n");
> + memdev_info->valid = false;
> + return -EINVAL;
> + }
Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and
num_channels * 8 for DDR3L?
Or just multiply by 4 before we do the "num_channel / 2"... This would
avoid even having to find out the DRAM type (in case it's not needed
anywhere else).
> +
> + /*
> + * Now read each DUNIT8/9/10/11 to check the rank of each
> dimms.
> + * all the dimms should have same rank as in first valid
> Dimm
> + */
I think we need to explain in the comment why we're iterating over the
4 dunits but just looking at the first one that's not 0xFFFFFFFF.
Isn't there a way to discover which dunits to look at based on the
number of channels and DRAM type?
Also, does the valid dunit amount reflect the number of channels like
in SKL?
> + memdev_info->rank_valid = false;
> + for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
> + val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
> + if (val != 0xFFFFFFFF) {
> + uint8_t rank;
> +
> + memdev_info->rank_valid = true;
> + rank = val & BXT_DRAM_RANK_MASK;
> + if (rank == BXT_DRAM_RANK_SINGLE)
> + memdev_info->rank =
> DRAM_RANK_SINGLE;
> + else if (rank == BXT_DRAM_RANK_DUAL)
> + memdev_info->rank = DRAM_RANK_DUAL;
> + else
> + memdev_info->rank = DRAM_RANK_NONE;
Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid
back to false? Return non-zero?
> +
> + break;
> + }
> + }
> + return 0;
> +}
> +
> +static void
> +intel_get_memdev_info(struct drm_i915_private *dev_priv)
> +{
> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
> + int ret;
> +
> + memdev_info->valid = false;
> + if (!IS_GEN9(dev_priv))
> + return;
> +
> + if (IS_BROXTON(dev_priv))
> + ret = bxt_get_memdev_info(dev_priv);
> + else
> + ret = skl_get_memdev_info(dev_priv);
> + if (ret)
> + return;
What I find a little hard to read is that we set memdev_info->valid
both here and in the functions we call. Due to the early returns it's
not super simple to see the possible value it may get, and things get
even more complicated due to the fact that we may return 0 from
skl_get_memdev_info() while still setting memdev_info->valid to false.
Things would be much simpler and easier to read if we only set
memdev_info->valid in this function, based on the return value of the
get_memdev_info() functions. And then the get_memdev_info() functions
would appropriately return zero/nonzero based on valid/invalid
configuration.
> +
> + if (memdev_info->valid) {
> + DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-
> %d channel-width-%d bytes\n",
> + memdev_info->mem_speed_khz,
> + memdev_info->num_channels,
> + memdev_info->channel_width_bytes);
As I mentioned in my previous review, please use the appropriate
specifiers here instead of %d for everybody.
Also, we usually print "variable: value, other variable: value" instead
of "variable-value other-variable-value". Please use the established
convention. Reading "total-channels-1" is less clear than "total
channels: 1" and doesn't raise ambiguity concerns with negative values.
> + if (memdev_info->rank_valid)
> + DRM_DEBUG_DRIVER("DRAM rank-%s\n",
> + (memdev_info->rank == DRAM_RANK_DUAL) ?
> + "dual" : "single");
But there's also DRAM_RANK_NONE.
> + }
> +}
> +
> +
> /**
> * i915_driver_init_hw - setup state requiring device access
> * @dev_priv: device private
> @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct
> drm_i915_private *dev_priv)
> DRM_DEBUG_DRIVER("can't enable MSI");
> }
>
> + /*
> + * Fill the memdev structure to get the system raw bandwidth
> + * This will be used by WM algorithm, to implement GEN9
> based WA
> + */
> + intel_get_memdev_info(dev_priv);
> +
> return 0;
>
> out_ggtt:
> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> index a219a35..adbd9aa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2057,6 +2057,20 @@ struct drm_i915_private {
> bool distrust_bios_wm;
> } wm;
>
> + struct memdev_info {
> + bool valid;
> + uint32_t mem_speed_khz;
> + uint8_t channel_width_bytes;
> + uint8_t num_channels;
> + bool rank_valid;
> + enum {
> + DRAM_RANK_NONE = 0,
> + DRAM_RANK_SINGLE,
> + DRAM_RANK_DUAL
> + } rank;
What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply
rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when
rank_valid=true?
Perhaps we could get rid of the rank_valid variable? Maybe rename
DRAM_RANK_NONE to DRAM_RANK_INVALID?
> + } memdev_info;
> +
> +
> struct i915_runtime_pm pm;
>
> /* Abstract the submission mechanism (legacy ringbuffer or
> execlists) away */
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index acc767a..a9c467c 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7721,6 +7721,44 @@ enum {
> #define DC_STATE_DEBUG_MASK_CORES (1<<0)
> #define DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1)
>
> +#define _MMIO_MCHBAR_PIPE(x, a, b) _MMIO(MCHBAR_MIRROR_BASE_S
> NB + _PIPE(x, a, b))
Wait, what? I may have misunderstood something, but AFAICS the
registers that use this macro are not per-pipe, so using the _PIPE
macro makes things super confusing. In this case it should be fine to
just do something a little more hardcoded, like:
#define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit)
* 0x200)
> +#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_S
> NB + 0x7114)
> +#define BXT_REQ_DATA_MASK (0x3F << 0)
Or just 0x3F to be consistent with the other definitions.
> +#define BXT_DRAM_TYPE_SHIFT 24
> +#define BXT_DRAM_TYPE_MASK 0x7
The spec you sent me lists these bits as reserved. Are you sure these
come from this register?
> +#define BXT_DRAM_CHANNEL_SHIFT 12
> +#define BXT_DRAM_CHANNEL_MASK 0xF
I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}.
> +
> +#define BXT_DRAM_TYPE_LPDDR3 0x1
> +#define BXT_DRAM_TYPE_LPDDR4 0x2
> +#define BXT_DRAM_TYPE_DDR3L 0x4
> +/*
> + * BIOS programs this field of REQ_DATA [5:0] in integer
> + * multiple of 133330 KHz (133.33MHz)
> + */
> +#define SKL_MEMORY_FREQ_MULTIPLIER 0x208D2
There's a bad tab here that should be a space.
Also, it makes a looooot more sense to use the decimal value instead of
0x208D2.
> +#define BXT_D_CR_DRP0_DUNIT8 0x1000
> +#define BXT_D_CR_DRP0_DUNIT9 0x1200
> +#define BXT_D_CR_DRP0_DUNIT_MAX 4
> +#define BXT_D_CR_DRP0_DUNIT(x) _MMIO_MCHBAR_PIPE(x,
> BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
> +#define BXT_DRAM_RANK_MASK 0x3
> +#define BXT_DRAM_RANK_SINGLE 0x1
> +#define BXT_DRAM_RANK_DUAL 0x3
> +
> +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU _MMIO(MCHBAR_MIRROR
> _BASE_SNB + 0x5E04)
Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't
SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big
difference? Also, that comment mentioning that we may sometimes get
zero is a little worrying: shouldn't we make sure we always read when
it's non-zero?
Thanks for the patch, and sorry for the huge amount of questions here:
learning while doing the review is sometimes tricky.
> +#define SKL_REQ_DATA_MASK (0xF << 0)
> +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR
> ROR_BASE_SNB + 0x500C)
> +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR
> ROR_BASE_SNB + 0x5010)
> +#define SKL_DRAM_CHANNEL_WIDTH_MASK 0x3
> +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT 8
> +#define SKL_DRAM_WIDTH_1 0x0
> +#define SKL_DRAM_WIDTH_2 0x1
> +#define SKL_DRAM_WIDTH_4 0x2
> +#define SKL_DRAM_RANK_MASK 0x1
> +#define SKL_DRAM_RANK_SHIFT 10
> +#define SKL_DRAM_RANK_SINGLE 0x0
> +#define SKL_DRAM_RANK_DUAL 0x1
> +
> /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
> this register,
> * since on HSW we can't write to it using I915_WRITE. */
> #define D_COMP_HSW _MMIO(MCHBAR_MIRROR_BASE_S
> NB + 0x5F0C)
More information about the Intel-gfx
mailing list