[Intel-gfx] [PATCH] drm/i915/icl: Fix context RPCS programming

Lionel Landwerlin lionel.g.landwerlin at intel.com
Fri Aug 31 16:52:55 UTC 2018


On 31/08/2018 12:53, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> There are two issues with the current RPCS programming for Icelake:
>
> Expansion of the slice count bitfield has been missed, as well as the
> required programming workaround for the subslice count bitfield size
> limitation.
>
> 1)
>
> Bitfield width for configuring the active slice count has grown so we need
> to program the GEN8_R_PWR_CLK_STATE accordingly.
>
> Current code was always requesting eight times the number of slices (due
> writting to a bitfield starting three bits higher than it should). These
> requests were luckily a) capped by the hardware to the available number of
> slices, and b) we haven't yet exported the code to ask for reduced slice
> configurations.
>
> Due both of the above there was no impact from this incorrect programming
> but we should still fix it.
>
> 2)
>
> Due subslice count bitfield being only three bits wide and furthermore
> capped to a maximum documented value of four, special programming
> workaround is needed to enable more than four subslices.
>
> With this programming driver has to consider the GT configuration as
> 2x4x8, while the hardware internally translates this to 1x8x8.
>
> A limitation stemming from this is that either a subslice count between
> one and four can be selected, or a subslice count equaling the total
> number of subslices in all selected slices. In other words, odd subslice
> counts greater than four are impossible, as are odd subslice counts
> greater than a single slice subslice count.
>
> This also had no impact in the current code base due breakage from 1)
> always reqesting more than one slice.
>
> While fixing this we also add some asserts to flag up any future bitfield
> overflows.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Bspec: 12247
> Reported-by: tony.ye at intel.com
> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: tony.ye at intel.com
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_reg.h  |  2 +
>   drivers/gpu/drm/i915/intel_lrc.c | 89 +++++++++++++++++++++++++++-----
>   2 files changed, 78 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index f2321785cbd6..09bc8e730ee1 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>   #define   GEN8_RPCS_S_CNT_ENABLE	(1 << 18)
>   #define   GEN8_RPCS_S_CNT_SHIFT		15
>   #define   GEN8_RPCS_S_CNT_MASK		(0x7 << GEN8_RPCS_S_CNT_SHIFT)
> +#define   GEN11_RPCS_S_CNT_SHIFT	12
> +#define   GEN11_RPCS_S_CNT_MASK		(0x3f << GEN11_RPCS_S_CNT_SHIFT)
>   #define   GEN8_RPCS_SS_CNT_ENABLE	(1 << 11)
>   #define   GEN8_RPCS_SS_CNT_SHIFT	8
>   #define   GEN8_RPCS_SS_CNT_MASK		(0x7 << GEN8_RPCS_SS_CNT_SHIFT)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f8ceb9c99dd6..323c46319cb8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2480,6 +2480,9 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   static u32
>   make_rpcs(struct drm_i915_private *dev_priv)
>   {
> +	bool subslice_pg = INTEL_INFO(dev_priv)->sseu.has_subslice_pg;
> +	u8 slices = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
> +	u8 subslices = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
>   	u32 rpcs = 0;
>   
>   	/*
> @@ -2489,6 +2492,38 @@ make_rpcs(struct drm_i915_private *dev_priv)
>   	if (INTEL_GEN(dev_priv) < 9)
>   		return 0;
>   
> +	/*
> +	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
> +	 * wide and Icelake has up to eight subslices, specfial programming is
> +	 * needed in order to correctly enable all subslices.
> +	 *
> +	 * According to documentation software must consider the configuration
> +	 * as 2x4x8 and hardware will translate this to 1x8x8.
> +	 *
> +	 * Furthemore, even though SScount is three bits, maximum documented
> +	 * value for it is four. From this some rules/restrictions follow:
> +	 *
> +	 * 1.
> +	 * If enabled subslice count is greater than four, two whole slices must
> +	 * be enabled instead.
> +	 *
> +	 * 2.
> +	 * When more than one slice is enabled, hardware ignores the subslice
> +	 * count altogether.
> +	 *
> +	 * From these restrictions it follows that it is not possible to enable
> +	 * a count of subslices between the SScount maximum of four restriction,
> +	 * and the maximum available number on a particular SKU. Either all
> +	 * subslices are enabled, or a count between one and four on the first
> +	 * slice.
> +	 */
> +	if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) {
> +		GEM_BUG_ON(subslices & 1);
> +
> +		subslice_pg = false;


Err... Now I'm looking at the documentation again and I see this for the 
subslice enable field :


Enable Subslice Count Request.

0 = Use Async subslice count

1 = Use SScount in this register


Searching for "Async subslice count" leads me to another register (BSpec 
9497).

Really confused as to what this does :|

Is the hardware then reading for that second register if you set 
SSCountEn to 0?


We could put max(hweight8(sseu.subslice_mask[0]), 4) in SSCount, but 
it's unclear what's right...


If you feel like this needs to be investigated, go ahead.

Otherwise I have a minor style suggestion below, either way :


Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>


> +		slices *= 2;
> +	}
> +
>   	/*
>   	 * Starting in Gen9, render power gating can leave
>   	 * slice/subslice/EU in a partially enabled state. We
> @@ -2496,24 +2531,52 @@ make_rpcs(struct drm_i915_private *dev_priv)
>   	 * enablement.
>   	*/
>   	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
> -		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> -		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
> -			GEN8_RPCS_S_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;


I would use a u32 val; like you did for the subslice part below and just 
OR things together.

Just feels a bit more readable, but up to you.


> +		u32 mask;
> +
> +		rpcs = slices;
> +
> +		if (INTEL_GEN(dev_priv) >= 11) {
> +			mask = GEN11_RPCS_S_CNT_MASK;
> +			rpcs <<= GEN11_RPCS_S_CNT_SHIFT;
> +		} else {
> +			mask = GEN8_RPCS_S_CNT_MASK;
> +			rpcs <<= GEN8_RPCS_S_CNT_SHIFT;
> +		}
> +
> +		GEM_BUG_ON(rpcs & ~mask);
> +		rpcs &= mask;
> +
> +		rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE;
>   	}
>   
> -	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
> -		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> -		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
> -			GEN8_RPCS_SS_CNT_SHIFT;
> -		rpcs |= GEN8_RPCS_ENABLE;
> +	if (subslice_pg) {
> +		u32 val = subslices;
> +
> +		val <<= GEN8_RPCS_SS_CNT_SHIFT;
> +
> +		GEM_BUG_ON(val & ~GEN8_RPCS_SS_CNT_MASK);
> +		val &= GEN8_RPCS_SS_CNT_MASK;
> +
> +		rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_SS_CNT_ENABLE | val;
>   	}
>   
>   	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
> -		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> -			GEN8_RPCS_EU_MIN_SHIFT;
> -		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> -			GEN8_RPCS_EU_MAX_SHIFT;
> +		u32 val;
> +
> +		val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> +		      GEN8_RPCS_EU_MIN_SHIFT;
> +		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
> +		val &= GEN8_RPCS_EU_MIN_MASK;
> +
> +		rpcs |= val;
> +
> +		val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> +		      GEN8_RPCS_EU_MAX_SHIFT;
> +		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
> +		val &= GEN8_RPCS_EU_MAX_MASK;
> +
> +		rpcs |= val;
> +
>   		rpcs |= GEN8_RPCS_ENABLE;
>   	}
>   




More information about the Intel-gfx mailing list