[Intel-gfx] [PATCH v4 3/8] drm/i915: Decode system memory bandwidth
Mahesh Kumar
mahesh1.kumar at intel.com
Wed Nov 16 11:48:51 UTC 2016
Hi,
On Friday 04 November 2016 12:36 AM, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
>> This patch adds support to decode system memory bandwidth
>> which will be used for arbitrated display memory percentage
>> calculation in GEN9 based system.
>>
>> Changes from v1:
>> - Address comments from Paulo
>> - implement decode function for SKL/KBL also
> Again, my understanding of these registers is not good so I will ask
> some questions that will help me properly understand and review the
> code. I may also end up asking some questions that don't make sense.
> Please correct me in case any of my assumptions is wrong.
>
> Perhaps answers to my questions could become code comments since I
> suppose most of the gfx team is not super familiar with the memory
> regs, and we're going to have to maintain the code.
>
>
>> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.c | 170
>> ++++++++++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/i915/i915_drv.h | 14 ++++
>> drivers/gpu/drm/i915/i915_reg.h | 38 +++++++++
>> 3 files changed, 222 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c
>> b/drivers/gpu/drm/i915/i915_drv.c
>> index 89d3222..b5f601c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct
>> drm_i915_private *dev_priv)
>> DRM_DEBUG_DRIVER("use GPU sempahores? %s\n",
>> yesno(i915.semaphores));
>> }
>>
>> +static inline void skl_memdev_info_fill(struct memdev_info *info,
>> uint32_t val)
>> +{
>> + uint8_t channel_width, rank;
>> +
>> + info->rank_valid = true;
>> + channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
>> + SKL_DRAM_CHANNEL_WIDTH_MASK;
> Why are we looking at the L bits instead of the S bits? What's the
> difference between them?
We need to see both L & S bits. This happen due to incomplete
interpretation/explanation of register.
There can be two dimm per channel, & each nibble tells the information
about L/S dimm
>
>> + if (channel_width == SKL_DRAM_WIDTH_1)
>> + info->channel_width_bytes = 1;
>> + else if (channel_width == SKL_DRAM_WIDTH_2)
>> + info->channel_width_bytes = 2;
>> + else
>> + info->channel_width_bytes = 4;
> This is not checking for the invalid value of 0x3. Perhaps a switch()
> instead of an if() ladder would be better here.
>
>> +
>> + rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
>> + if (rank == SKL_DRAM_RANK_SINGLE)
>> + info->rank = DRAM_RANK_SINGLE;
>> + else
>> + info->rank = DRAM_RANK_DUAL;
>> +}
>> +
>> +static int
>> +skl_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> + uint32_t val = 0;
>> + uint32_t mem_speed = 0;
>> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +
>> + val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
>> + mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
>> + SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> But but but isn't it possible to have more than 2GHz in SKL? I think I
> have a big SKL machine with more than 2GHz here... I'll check it later.
> Also, don't we need to check FREQ_TYPE and ODD_RATIO?
It seems it's possible,
here we have to multiply frequency by 2 again to get the actual freq. So
we can have upto 4GHz. current MAX is 2133MHz
This was not mentioned in the CSpec, but got confirmation from HW team.
>
> Some digging suggests that maybe we need to figure out somehow if we're
> using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ
> register instead of BIOS_DATA).
>
> I'd really need some clarification on how all of this works.
>
> Also, it looks like there's no need for this to be a 64bit calculation
> due to the mask limit.
>
>
>> +
>> + if (mem_speed == 0)
>> + return -EINVAL;
>> +
>> + memdev_info->valid = true;
>> + memdev_info->mem_speed_khz = mem_speed;
>> + memdev_info->num_channels = 0;
>> +
>> + memdev_info->rank_valid = false;
>> + val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
>> +
>> + if (val != 0xFFFFFFFF) {
>> + memdev_info->num_channels++;
>> + skl_memdev_info_fill(memdev_info, val);
>> + }
>> +
>> + val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
>> +
>> + if (val != 0xFFFFFFFF) {
>> + memdev_info->num_channels++;
>> + if (!memdev_info->rank_valid)
>> + skl_memdev_info_fill(memdev_info, val);
>> + }
> A simple iteration over the two register addresses would have saved
> some code :).
>
> Also, isn't it possible to have channels with different width/rank?
Yes, channel can have different RANK, assumption made here is wrong.
need to rewrite the code according to new information got.
> Shouldn't our code be sanity-checking this? Perhaps grab the width/rank
> for each channel and compare. When the specs are unclear or don't
> exist, it's probably better if our code is able to check/validate its
> own assumptions.
>
>
>> +
>> + if (memdev_info->num_channels == 0)
>> + memdev_info->valid = false;
>> + return 0;
>> +}
>> +
>> +static int
>> +bxt_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> + uint32_t dram_channel;
>> + uint32_t mem_speed, val;
>> + uint8_t num_channel, dram_type;
>> + int i;
>> +
>> + val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
>> + mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
>> + SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> AFAIU, there's no need for 64 bit division since the maximum value
> is 8399790.
Will remove 64 bit operation
>
>
>> +
>> + if (mem_speed == 0)
>> + return -EINVAL;
>> +
>> + memdev_info->valid = true;
>> + memdev_info->mem_speed_khz = mem_speed;
>> + dram_type = (val >> BXT_DRAM_TYPE_SHIFT) &
>> BXT_DRAM_TYPE_MASK;
>> + dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) &
>> BXT_DRAM_CHANNEL_MASK;
>> + num_channel = hweight32(dram_channel);
>> +
>> + /*
>> + * The lpddr3 and lpddr4 technologies can have 1-4 channels
>> and the
>> + * channels are 32bits wide; while ddr3l technologies can
>> have 1-2
>> + * channels and the channels are 64 bits wide. In case of
>> single 64 bit
>> + * wide DDR3L dimm sets two channel-active bits in
>> + * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L
>> 64bit dimm
>> + * will set all four channel-active bits in above register.
>> + * In order to get actual number of channels in DDR3L DRAM
>> we need to
>> + * device total channel-active bits set by 2.
>> + */
>> +
>> + switch (dram_type) {
>> + case BXT_DRAM_TYPE_LPDDR3:
>> + case BXT_DRAM_TYPE_LPDDR4:
>> + memdev_info->channel_width_bytes = 4;
>> + memdev_info->num_channels = num_channel;
>> + break;
>> + case BXT_DRAM_TYPE_DDR3L:
>> + memdev_info->channel_width_bytes = 8;
>> + memdev_info->num_channels = num_channel / 2;
>> + break;
>> + default:
>> + DRM_DEBUG_KMS("Unknown DRAM type\n");
>> + memdev_info->valid = false;
>> + return -EINVAL;
>> + }
> Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and
> num_channels * 8 for DDR3L?
>
> Or just multiply by 4 before we do the "num_channel / 2"... This would
> avoid even having to find out the DRAM type (in case it's not needed
> anywhere else).
DRAM type is not needed anywhere else, but it's require to correctly
find the number of channels,
which will be needed while deciding on WA.
>> +
>> + /*
>> + * Now read each DUNIT8/9/10/11 to check the rank of each
>> dimms.
>> + * all the dimms should have same rank as in first valid
>> Dimm
>> + */
>
> I think we need to explain in the comment why we're iterating over the
> 4 dunits but just looking at the first one that's not 0xFFFFFFFF.
As told above, assumption made that all channel will have same RANK is
wrong, need to rework.
> Isn't there a way to discover which dunits to look at based on the
> number of channels and DRAM type?
In case of DDR3L dunit-9/10 only can be set as per number of channels.
Don't you think it'll add more complexity in the code?
> Also, does the valid dunit amount reflect the number of channels like
> in SKL?
Yes, number of channel will be equal to DUNIT set.
Oh, using this we can re-factor the code & calculate num of channels
same way as in SKL & always assume
channel_width_bytes * channel as "4 * active_bits set above",
even instead of storing freq & channel width separately, we can store
system bandwidth itself. This will avoid same "system bandwidth"
calculation in each flip.
>
>
>> + memdev_info->rank_valid = false;
>> + for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
>> + val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
>> + if (val != 0xFFFFFFFF) {
>> + uint8_t rank;
>> +
>> + memdev_info->rank_valid = true;
>> + rank = val & BXT_DRAM_RANK_MASK;
>> + if (rank == BXT_DRAM_RANK_SINGLE)
>> + memdev_info->rank =
>> DRAM_RANK_SINGLE;
>> + else if (rank == BXT_DRAM_RANK_DUAL)
>> + memdev_info->rank = DRAM_RANK_DUAL;
>> + else
>> + memdev_info->rank = DRAM_RANK_NONE;
> Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid
> back to false? Return non-zero?
>
>
>> +
>> + break;
>> + }
>> + }
>> + return 0;
>> +}
>> +
>> +static void
>> +intel_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> + struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> + int ret;
>> +
>> + memdev_info->valid = false;
>> + if (!IS_GEN9(dev_priv))
>> + return;
>> +
>> + if (IS_BROXTON(dev_priv))
>> + ret = bxt_get_memdev_info(dev_priv);
>> + else
>> + ret = skl_get_memdev_info(dev_priv);
>> + if (ret)
>> + return;
> What I find a little hard to read is that we set memdev_info->valid
> both here and in the functions we call. Due to the early returns it's
> not super simple to see the possible value it may get, and things get
> even more complicated due to the fact that we may return 0 from
> skl_get_memdev_info() while still setting memdev_info->valid to false.
>
> Things would be much simpler and easier to read if we only set
> memdev_info->valid in this function, based on the return value of the
> get_memdev_info() functions. And then the get_memdev_info() functions
> would appropriately return zero/nonzero based on valid/invalid
> configuration.
>
>
>> +
>> + if (memdev_info->valid) {
>> + DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-
>> %d channel-width-%d bytes\n",
>> + memdev_info->mem_speed_khz,
>> + memdev_info->num_channels,
>> + memdev_info->channel_width_bytes);
> As I mentioned in my previous review, please use the appropriate
> specifiers here instead of %d for everybody.
>
> Also, we usually print "variable: value, other variable: value" instead
> of "variable-value other-variable-value". Please use the established
> convention. Reading "total-channels-1" is less clear than "total
> channels: 1" and doesn't raise ambiguity concerns with negative values.
>
>
>> + if (memdev_info->rank_valid)
>> + DRM_DEBUG_DRIVER("DRAM rank-%s\n",
>> + (memdev_info->rank == DRAM_RANK_DUAL) ?
>> + "dual" : "single");
> But there's also DRAM_RANK_NONE.
>
>
>> + }
>> +}
>> +
>> +
>> /**
>> * i915_driver_init_hw - setup state requiring device access
>> * @dev_priv: device private
>> @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct
>> drm_i915_private *dev_priv)
>> DRM_DEBUG_DRIVER("can't enable MSI");
>> }
>>
>> + /*
>> + * Fill the memdev structure to get the system raw bandwidth
>> + * This will be used by WM algorithm, to implement GEN9
>> based WA
>> + */
>> + intel_get_memdev_info(dev_priv);
>> +
>> return 0;
>>
>> out_ggtt:
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index a219a35..adbd9aa 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2057,6 +2057,20 @@ struct drm_i915_private {
>> bool distrust_bios_wm;
>> } wm;
>>
>> + struct memdev_info {
>> + bool valid;
>> + uint32_t mem_speed_khz;
>> + uint8_t channel_width_bytes;
>> + uint8_t num_channels;
>> + bool rank_valid;
>> + enum {
>> + DRAM_RANK_NONE = 0,
>> + DRAM_RANK_SINGLE,
>> + DRAM_RANK_DUAL
>> + } rank;
> What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply
> rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when
> rank_valid=true?
>
> Perhaps we could get rid of the rank_valid variable? Maybe rename
> DRAM_RANK_NONE to DRAM_RANK_INVALID?
Agree, will make change to use DRAM_RANK_INVALID instead of rank_valid
variable :)
>
>
>> + } memdev_info;
>> +
>> +
>> struct i915_runtime_pm pm;
>>
>> /* Abstract the submission mechanism (legacy ringbuffer or
>> execlists) away */
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index acc767a..a9c467c 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -7721,6 +7721,44 @@ enum {
>> #define DC_STATE_DEBUG_MASK_CORES (1<<0)
>> #define DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1)
>>
>> +#define _MMIO_MCHBAR_PIPE(x, a, b) _MMIO(MCHBAR_MIRROR_BASE_S
>> NB + _PIPE(x, a, b))
> Wait, what? I may have misunderstood something, but AFAICS the
> registers that use this macro are not per-pipe, so using the _PIPE
> macro makes things super confusing. In this case it should be fine to
> just do something a little more hardcoded, like:
>
> #define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit)
> * 0x200)
I was trying to use already available MACRO, will make changes as
suggested by you.
>
>
>> +#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x7114)
>> +#define BXT_REQ_DATA_MASK (0x3F << 0)
> Or just 0x3F to be consistent with the other definitions.
>
>
>> +#define BXT_DRAM_TYPE_SHIFT 24
>> +#define BXT_DRAM_TYPE_MASK 0x7
> The spec you sent me lists these bits as reserved. Are you sure these
> come from this register?
bits 26:24 tells the DRAM_TYPE information in
P_CR_MC_BIOS_REQ_0_0_0_MCHBAR register
thanks for review,
Regards,
-Mahesh
>
>
>> +#define BXT_DRAM_CHANNEL_SHIFT 12
>> +#define BXT_DRAM_CHANNEL_MASK 0xF
> I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}.
>
>
>> +
>> +#define BXT_DRAM_TYPE_LPDDR3 0x1
>> +#define BXT_DRAM_TYPE_LPDDR4 0x2
>> +#define BXT_DRAM_TYPE_DDR3L 0x4
>> +/*
>> + * BIOS programs this field of REQ_DATA [5:0] in integer
>> + * multiple of 133330 KHz (133.33MHz)
>> + */
>> +#define SKL_MEMORY_FREQ_MULTIPLIER 0x208D2
> There's a bad tab here that should be a space.
>
> Also, it makes a looooot more sense to use the decimal value instead of
> 0x208D2.
>
>
>> +#define BXT_D_CR_DRP0_DUNIT8 0x1000
>> +#define BXT_D_CR_DRP0_DUNIT9 0x1200
>> +#define BXT_D_CR_DRP0_DUNIT_MAX 4
>> +#define BXT_D_CR_DRP0_DUNIT(x) _MMIO_MCHBAR_PIPE(x,
>> BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
>> +#define BXT_DRAM_RANK_MASK 0x3
>> +#define BXT_DRAM_RANK_SINGLE 0x1
>> +#define BXT_DRAM_RANK_DUAL 0x3
>> +
>> +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU _MMIO(MCHBAR_MIRROR
>> _BASE_SNB + 0x5E04)
> Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't
> SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big
> difference? Also, that comment mentioning that we may sometimes get
> zero is a little worrying: shouldn't we make sure we always read when
> it's non-zero?
>
> Thanks for the patch, and sorry for the huge amount of questions here:
> learning while doing the review is sometimes tricky.
>
>> +#define SKL_REQ_DATA_MASK (0xF << 0)
>> +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x500C)
>> +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x5010)
>> +#define SKL_DRAM_CHANNEL_WIDTH_MASK 0x3
>> +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT 8
>> +#define SKL_DRAM_WIDTH_1 0x0
>> +#define SKL_DRAM_WIDTH_2 0x1
>> +#define SKL_DRAM_WIDTH_4 0x2
>> +#define SKL_DRAM_RANK_MASK 0x1
>> +#define SKL_DRAM_RANK_SHIFT 10
>> +#define SKL_DRAM_RANK_SINGLE 0x0
>> +#define SKL_DRAM_RANK_DUAL 0x1
>> +
>> /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
>> this register,
>> * since on HSW we can't write to it using I915_WRITE. */
>> #define D_COMP_HSW _MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x5F0C)
More information about the Intel-gfx
mailing list