[Intel-gfx] [PATCH v4 3/8] drm/i915: Decode system memory bandwidth

Mahesh Kumar mahesh1.kumar at intel.com
Wed Nov 16 11:48:51 UTC 2016


Hi,


On Friday 04 November 2016 12:36 AM, Paulo Zanoni wrote:
> Em Qui, 2016-10-13 às 16:28 +0530, Kumar, Mahesh escreveu:
>> This patch adds support to decode system memory bandwidth
>> which will be used for arbitrated display memory percentage
>> calculation in GEN9 based system.
>>
>> Changes from v1:
>>   - Address comments from Paulo
>>   - implement decode function for SKL/KBL also
> Again, my understanding of these registers is not good so I will ask
> some questions that will help me properly understand and review the
> code. I may also end up asking some questions that don't make sense.
> Please correct me in case any of my assumptions is wrong.
>
> Perhaps answers to my questions could become code comments since I
> suppose most of the gfx team is not super familiar with the memory
> regs, and we're going to have to maintain the code.
>
>
>> Signed-off-by: "Kumar, Mahesh" <mahesh1.kumar at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.c | 170
>> ++++++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_drv.h |  14 ++++
>>   drivers/gpu/drm/i915/i915_reg.h |  38 +++++++++
>>   3 files changed, 222 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c
>> b/drivers/gpu/drm/i915/i915_drv.c
>> index 89d3222..b5f601c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -979,6 +979,170 @@ static void intel_sanitize_options(struct
>> drm_i915_private *dev_priv)
>>   	DRM_DEBUG_DRIVER("use GPU sempahores? %s\n",
>> yesno(i915.semaphores));
>>   }
>>   
>> +static inline void skl_memdev_info_fill(struct memdev_info *info,
>> uint32_t val)
>> +{
>> +	uint8_t channel_width, rank;
>> +
>> +	info->rank_valid = true;
>> +	channel_width = (val >> SKL_DRAM_CHANNEL_WIDTH_SHIFT) &
>> +		SKL_DRAM_CHANNEL_WIDTH_MASK;
> Why are we looking at the L bits instead of the S bits? What's the
> difference between them?
We need to see both L & S bits. This happen due to incomplete 
interpretation/explanation of register.
There can be two dimm per channel, & each nibble tells the information 
about L/S dimm
>
>> +	if (channel_width == SKL_DRAM_WIDTH_1)
>> +		info->channel_width_bytes = 1;
>> +	else if (channel_width == SKL_DRAM_WIDTH_2)
>> +		info->channel_width_bytes = 2;
>> +	else
>> +		info->channel_width_bytes = 4;
> This is not checking for the invalid value of 0x3. Perhaps a switch()
> instead of an if() ladder would be better here.
>
>> +
>> +	rank = (val >> SKL_DRAM_RANK_SHIFT) & SKL_DRAM_RANK_MASK;
>> +	if (rank == SKL_DRAM_RANK_SINGLE)
>> +		info->rank = DRAM_RANK_SINGLE;
>> +	else
>> +		info->rank = DRAM_RANK_DUAL;
>> +}
>> +
>> +static int
>> +skl_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	uint32_t val = 0;
>> +	uint32_t mem_speed = 0;
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +
>> +	val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU);
>> +	mem_speed = div_u64((uint64_t) (val & SKL_REQ_DATA_MASK) *
>> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> But but but isn't it possible to have more than 2GHz in SKL? I think I
> have a big SKL machine with more than 2GHz here... I'll check it later.
> Also, don't we need to check FREQ_TYPE and ODD_RATIO?
It seems it's possible,
here we have to multiply frequency by 2 again to get the actual freq. So 
we can have upto 4GHz. current MAX is 2133MHz
This was not mentioned in the CSpec, but got confirmation from HW team.
>
> Some digging suggests that maybe we need to figure out somehow if we're
> using DCLK or QCLK and multiply accordingly (based on the BIOS_REQ
> register instead of BIOS_DATA).
>
> I'd really need some clarification on how all of this works.
>
> Also, it looks like there's no need for this to be a 64bit calculation
> due to the mask limit.
>
>
>> +
>> +	if (mem_speed == 0)
>> +		return -EINVAL;
>> +
>> +	memdev_info->valid = true;
>> +	memdev_info->mem_speed_khz = mem_speed;
>> +	memdev_info->num_channels  = 0;
>> +
>> +	memdev_info->rank_valid = false;
>> +	val = I915_READ(SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN);
>> +
>> +	if (val != 0xFFFFFFFF) {
>> +		memdev_info->num_channels++;
>> +		skl_memdev_info_fill(memdev_info, val);
>> +	}
>> +
>> +	val = I915_READ(SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN);
>> +
>> +	if (val != 0xFFFFFFFF) {
>> +		memdev_info->num_channels++;
>> +		if (!memdev_info->rank_valid)
>> +			skl_memdev_info_fill(memdev_info, val);
>> +	}
> A simple iteration over the two register addresses would have saved
> some code :).
>
> Also, isn't it possible to have channels with different width/rank?
Yes, channel can have different RANK, assumption made here is wrong.
need to rewrite the code according to new information got.
> Shouldn't our code be sanity-checking this? Perhaps grab the width/rank
> for each channel and compare. When the specs are unclear or don't
> exist, it's probably better if our code is able to check/validate its
> own assumptions.
>
>
>> +
>> +	if (memdev_info->num_channels == 0)
>> +		memdev_info->valid = false;
>> +	return 0;
>> +}
>> +
>> +static int
>> +bxt_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +	uint32_t dram_channel;
>> +	uint32_t mem_speed, val;
>> +	uint8_t num_channel, dram_type;
>> +	int i;
>> +
>> +	val = I915_READ(BXT_P_CR_MC_BIOS_REQ_0_0_0);
>> +	mem_speed = div_u64((uint64_t) (val & BXT_REQ_DATA_MASK) *
>> +			SKL_MEMORY_FREQ_MULTIPLIER, 1000);
> AFAIU, there's no need for 64 bit division since the maximum value
> is 8399790.
Will remove 64 bit operation
>
>
>> +
>> +	if (mem_speed == 0)
>> +		return -EINVAL;
>> +
>> +	memdev_info->valid = true;
>> +	memdev_info->mem_speed_khz = mem_speed;
>> +	dram_type = (val >> BXT_DRAM_TYPE_SHIFT) &
>> BXT_DRAM_TYPE_MASK;
>> +	dram_channel = (val >> BXT_DRAM_CHANNEL_SHIFT) &
>> BXT_DRAM_CHANNEL_MASK;
>> +	num_channel = hweight32(dram_channel);
>> +
>> +	/*
>> +	 * The lpddr3 and lpddr4 technologies can have 1-4 channels
>> and the
>> +	 * channels are 32bits wide; while ddr3l technologies can
>> have 1-2
>> +	 * channels and the channels are 64 bits wide. In case of
>> single 64 bit
>> +	 * wide DDR3L dimm sets two channel-active bits in
>> +	 * P_CR_MC_BIOS_REQ_0_0_0 register and system with two DDR3L
>> 64bit dimm
>> +	 * will set all four channel-active bits in above register.
>> +	 * In order to get actual number of channels in DDR3L DRAM
>> we need to
>> +	 * device total channel-active bits set by 2.
>> +	 */
>> +
>> +	switch (dram_type) {
>> +	case BXT_DRAM_TYPE_LPDDR3:
>> +	case BXT_DRAM_TYPE_LPDDR4:
>> +		memdev_info->channel_width_bytes = 4;
>> +		memdev_info->num_channels = num_channel;
>> +		break;
>> +	case BXT_DRAM_TYPE_DDR3L:
>> +		memdev_info->channel_width_bytes = 8;
>> +		memdev_info->num_channels = num_channel / 2;
>> +		break;
>> +	default:
>> +		DRM_DEBUG_KMS("Unknown DRAM type\n");
>> +		memdev_info->valid = false;
>> +		return -EINVAL;
>> +	}
> Shouldn't channel_width_bytes just be num_channels * 4 for LPDDR3/4 and
> num_channels * 8 for DDR3L?
>
> Or just multiply by 4 before we do the "num_channel / 2"... This would
> avoid even having to find out the DRAM type (in case it's not needed
> anywhere else).
DRAM type is not needed anywhere else, but it's require to correctly 
find the number of channels,
which will be needed while deciding on WA.
>> +
>> +	/*
>> +	 * Now read each DUNIT8/9/10/11 to check the rank of each
>> dimms.
>> +	 * all the dimms should have same rank as in first valid
>> Dimm
>> +	 */
>
> I think we need to explain in the comment why we're iterating over the
> 4 dunits but just looking at the first one that's not 0xFFFFFFFF.
As told above, assumption made that all channel will have same RANK is 
wrong, need to rework.
> Isn't there a way to discover which dunits to look at based on the
> number of channels and DRAM type?
In case of DDR3L dunit-9/10 only can be set as per number of channels.
Don't you think it'll add more complexity in the code?
> Also, does the valid dunit amount reflect the number of channels like
> in SKL?
Yes, number of channel will be equal to DUNIT set.
Oh, using this we can re-factor the code & calculate num of channels 
same way as in SKL & always assume
channel_width_bytes * channel as "4 * active_bits set above",
even instead of storing freq & channel width separately, we can store 
system bandwidth itself. This will avoid same "system bandwidth" 
calculation in each flip.
>
>
>> +	memdev_info->rank_valid = false;
>> +	for (i = 0; i < BXT_D_CR_DRP0_DUNIT_MAX; i++) {
>> +		val = I915_READ(BXT_D_CR_DRP0_DUNIT(i));
>> +		if (val != 0xFFFFFFFF) {
>> +			uint8_t rank;
>> +
>> +			memdev_info->rank_valid = true;
>> +			rank = val & BXT_DRAM_RANK_MASK;
>> +			if (rank == BXT_DRAM_RANK_SINGLE)
>> +				memdev_info->rank =
>> DRAM_RANK_SINGLE;
>> +			else if (rank == BXT_DRAM_RANK_DUAL)
>> +				memdev_info->rank = DRAM_RANK_DUAL;
>> +			else
>> +				memdev_info->rank = DRAM_RANK_NONE;
> Probably DRM_DEBUG_KMS("something\n") here? Also maybe set rank_valid
> back to false? Return non-zero?
>
>
>> +
>> +			break;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void
>> +intel_get_memdev_info(struct drm_i915_private *dev_priv)
>> +{
>> +	struct memdev_info *memdev_info = &dev_priv->memdev_info;
>> +	int ret;
>> +
>> +	memdev_info->valid = false;
>> +	if (!IS_GEN9(dev_priv))
>> +		return;
>> +
>> +	if (IS_BROXTON(dev_priv))
>> +		ret = bxt_get_memdev_info(dev_priv);
>> +	else
>> +		ret = skl_get_memdev_info(dev_priv);
>> +	if (ret)
>> +		return;
> What I find a little hard to read is that we set memdev_info->valid
> both here and in the functions we call. Due to the early returns it's
> not super simple to see the possible value it may get, and things get
> even more complicated due to the fact that we may return 0 from
> skl_get_memdev_info() while still setting memdev_info->valid to false.
>
> Things would be much simpler and easier to read if we only set
> memdev_info->valid in this function, based on the return value of the
> get_memdev_info() functions. And then the get_memdev_info() functions
> would appropriately return zero/nonzero based on valid/invalid
> configuration.
>
>
>> +
>> +	if (memdev_info->valid) {
>> +		DRM_DEBUG_DRIVER("DRAM speed-%d Khz total-channels-
>> %d channel-width-%d bytes\n",
>> +				memdev_info->mem_speed_khz,
>> +				memdev_info->num_channels,
>> +				memdev_info->channel_width_bytes);
> As I mentioned in my previous review, please use the appropriate
> specifiers here instead of %d for everybody.
>
> Also, we usually print "variable: value, other variable: value" instead
> of "variable-value other-variable-value". Please use the established
> convention. Reading "total-channels-1" is less clear than "total
> channels: 1" and doesn't raise ambiguity concerns with negative values.
>
>
>> +		if (memdev_info->rank_valid)
>> +			DRM_DEBUG_DRIVER("DRAM rank-%s\n",
>> +			(memdev_info->rank == DRAM_RANK_DUAL) ?
>> +						"dual" : "single");
> But there's also DRAM_RANK_NONE.
>
>
>> +	}
>> +}
>> +
>> +
>>   /**
>>    * i915_driver_init_hw - setup state requiring device access
>>    * @dev_priv: device private
>> @@ -1082,6 +1246,12 @@ static int i915_driver_init_hw(struct
>> drm_i915_private *dev_priv)
>>   			DRM_DEBUG_DRIVER("can't enable MSI");
>>   	}
>>   
>> +	/*
>> +	 * Fill the memdev structure to get the system raw bandwidth
>> +	 * This will be used by WM algorithm, to implement GEN9
>> based WA
>> +	 */
>> +	intel_get_memdev_info(dev_priv);
>> +
>>   	return 0;
>>   
>>   out_ggtt:
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index a219a35..adbd9aa 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2057,6 +2057,20 @@ struct drm_i915_private {
>>   		bool distrust_bios_wm;
>>   	} wm;
>>   
>> +	struct memdev_info {
>> +		bool valid;
>> +		uint32_t mem_speed_khz;
>> +		uint8_t channel_width_bytes;
>> +		uint8_t num_channels;
>> +		bool rank_valid;
>> +		enum {
>> +			DRAM_RANK_NONE = 0,
>> +			DRAM_RANK_SINGLE,
>> +			DRAM_RANK_DUAL
>> +		} rank;
> What's the meaning of DRAM_RANK_NONE? Does DRAM_RANK_NONE imply
> rank_valid=false? What's the meaning of rank=DRAM_RANK_NONE when
> rank_valid=true?
>
> Perhaps we could get rid of the rank_valid variable? Maybe rename
> DRAM_RANK_NONE to DRAM_RANK_INVALID?
Agree, will make change to use DRAM_RANK_INVALID instead of rank_valid 
variable :)
>
>
>> +	} memdev_info;
>> +
>> +
>>   	struct i915_runtime_pm pm;
>>   
>>   	/* Abstract the submission mechanism (legacy ringbuffer or
>> execlists) away */
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index acc767a..a9c467c 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -7721,6 +7721,44 @@ enum {
>>   #define  DC_STATE_DEBUG_MASK_CORES	(1<<0)
>>   #define  DC_STATE_DEBUG_MASK_MEMORY_UP	(1<<1)
>>   
>> +#define _MMIO_MCHBAR_PIPE(x, a, b)	_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + _PIPE(x, a, b))
> Wait, what? I may have misunderstood something, but AFAICS the
> registers that use this macro are not per-pipe, so using the _PIPE
> macro makes things super confusing. In this case it should be fine to
> just do something a little more hardcoded, like:
>
> #define BXT_D_CR_DRP0_DUNIT(unit) _MMIO(MCHBAR_MIRROR_BASE_SNB + (unit)
> * 0x200)
I was trying to use already available MACRO, will make changes as 
suggested by you.
>
> 	
>> +#define BXT_P_CR_MC_BIOS_REQ_0_0_0	_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x7114)
>> +#define BXT_REQ_DATA_MASK		(0x3F << 0)
> Or just 0x3F to be consistent with the other definitions.
>
>
>> +#define BXT_DRAM_TYPE_SHIFT		24
>> +#define BXT_DRAM_TYPE_MASK		0x7
> The spec you sent me lists these bits as reserved. Are you sure these
> come from this register?
bits 26:24 tells the DRAM_TYPE information in 
P_CR_MC_BIOS_REQ_0_0_0_MCHBAR register
thanks for review,

Regards,
-Mahesh
>
>
>> +#define BXT_DRAM_CHANNEL_SHIFT		12
>> +#define BXT_DRAM_CHANNEL_MASK		0xF
> I'd rename these to BXT_DRAM_CHANNEL_ACTIVE_{SHIFT,MASK}.
>
>
>> +
>> +#define BXT_DRAM_TYPE_LPDDR3		0x1
>> +#define BXT_DRAM_TYPE_LPDDR4		0x2
>> +#define BXT_DRAM_TYPE_DDR3L		0x4
>> +/*
>> + * BIOS programs this field of REQ_DATA [5:0] in integer
>> + * multiple of 133330 KHz (133.33MHz)
>> + */
>> +#define	SKL_MEMORY_FREQ_MULTIPLIER		0x208D2
> There's a bad tab here that should be a space.
>
> Also, it makes a looooot more sense to use the decimal value instead of
> 0x208D2.
>
>
>> +#define BXT_D_CR_DRP0_DUNIT8			0x1000
>> +#define BXT_D_CR_DRP0_DUNIT9			0x1200
>> +#define BXT_D_CR_DRP0_DUNIT_MAX			4
>> +#define BXT_D_CR_DRP0_DUNIT(x)	_MMIO_MCHBAR_PIPE(x,
>> BXT_D_CR_DRP0_DUNIT8, BXT_D_CR_DRP0_DUNIT9)
>> +#define BXT_DRAM_RANK_MASK			0x3
>> +#define BXT_DRAM_RANK_SINGLE			0x1
>> +#define BXT_DRAM_RANK_DUAL			0x3
>> +
>> +#define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU	_MMIO(MCHBAR_MIRROR
>> _BASE_SNB + 0x5E04)
> Why does BXT look at BIOS_REQ while SKL looks at BIOS_DATA? Shouldn't
> SKL also look at BIOS_DATA or BXT look at BIOS_REQ? What's the big
> difference? Also, that comment mentioning that we may sometimes get
> zero is a little worrying: shouldn't we make sure we always read when
> it's non-zero?
>
> Thanks for the patch, and sorry for the huge amount of questions here:
> learning while doing the review is sometimes tricky.
>
>> +#define SKL_REQ_DATA_MASK			(0xF << 0)
>> +#define SKL_MAD_DIMM_CH0_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x500C)
>> +#define SKL_MAD_DIMM_CH1_0_0_0_MCHBAR_MCMAIN	_MMIO(MCHBAR_MIR
>> ROR_BASE_SNB + 0x5010)
>> +#define SKL_DRAM_CHANNEL_WIDTH_MASK		0x3
>> +#define SKL_DRAM_CHANNEL_WIDTH_SHIFT		8
>> +#define SKL_DRAM_WIDTH_1			0x0
>> +#define SKL_DRAM_WIDTH_2			0x1
>> +#define SKL_DRAM_WIDTH_4			0x2
>> +#define SKL_DRAM_RANK_MASK			0x1
>> +#define SKL_DRAM_RANK_SHIFT			10
>> +#define SKL_DRAM_RANK_SINGLE			0x0
>> +#define SKL_DRAM_RANK_DUAL			0x1
>> +
>>   /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
>> this register,
>>    * since on HSW we can't write to it using I915_WRITE. */
>>   #define D_COMP_HSW			_MMIO(MCHBAR_MIRROR_BASE_S
>> NB + 0x5F0C)



More information about the Intel-gfx mailing list