[Intel-gfx] [PATCH] drm/i915/icl: Fix context RPCS programming
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Fri Aug 31 16:52:55 UTC 2018
On 31/08/2018 12:53, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> There are two issues with the current RPCS programming for Icelake:
>
> Expansion of the slice count bitfield has been missed, as well as the
> required programming workaround for the subslice count bitfield size
> limitation.
>
> 1)
>
> Bitfield width for configuring the active slice count has grown so we need
> to program the GEN8_R_PWR_CLK_STATE accordingly.
>
> Current code was always requesting eight times the number of slices (due
> writting to a bitfield starting three bits higher than it should). These
> requests were luckily a) capped by the hardware to the available number of
> slices, and b) we haven't yet exported the code to ask for reduced slice
> configurations.
>
> Due both of the above there was no impact from this incorrect programming
> but we should still fix it.
>
> 2)
>
> Due subslice count bitfield being only three bits wide and furthermore
> capped to a maximum documented value of four, special programming
> workaround is needed to enable more than four subslices.
>
> With this programming driver has to consider the GT configuration as
> 2x4x8, while the hardware internally translates this to 1x8x8.
>
> A limitation stemming from this is that either a subslice count between
> one and four can be selected, or a subslice count equaling the total
> number of subslices in all selected slices. In other words, odd subslice
> counts greater than four are impossible, as are odd subslice counts
> greater than a single slice subslice count.
>
> This also had no impact in the current code base due breakage from 1)
> always reqesting more than one slice.
>
> While fixing this we also add some asserts to flag up any future bitfield
> overflows.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Bspec: 12247
> Reported-by: tony.ye at intel.com
> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> Cc: tony.ye at intel.com
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_reg.h | 2 +
> drivers/gpu/drm/i915/intel_lrc.c | 89 +++++++++++++++++++++++++++-----
> 2 files changed, 78 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index f2321785cbd6..09bc8e730ee1 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -344,6 +344,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> #define GEN8_RPCS_S_CNT_ENABLE (1 << 18)
> #define GEN8_RPCS_S_CNT_SHIFT 15
> #define GEN8_RPCS_S_CNT_MASK (0x7 << GEN8_RPCS_S_CNT_SHIFT)
> +#define GEN11_RPCS_S_CNT_SHIFT 12
> +#define GEN11_RPCS_S_CNT_MASK (0x3f << GEN11_RPCS_S_CNT_SHIFT)
> #define GEN8_RPCS_SS_CNT_ENABLE (1 << 11)
> #define GEN8_RPCS_SS_CNT_SHIFT 8
> #define GEN8_RPCS_SS_CNT_MASK (0x7 << GEN8_RPCS_SS_CNT_SHIFT)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f8ceb9c99dd6..323c46319cb8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2480,6 +2480,9 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
> static u32
> make_rpcs(struct drm_i915_private *dev_priv)
> {
> + bool subslice_pg = INTEL_INFO(dev_priv)->sseu.has_subslice_pg;
> + u8 slices = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
> + u8 subslices = hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> u32 rpcs = 0;
>
> /*
> @@ -2489,6 +2492,38 @@ make_rpcs(struct drm_i915_private *dev_priv)
> if (INTEL_GEN(dev_priv) < 9)
> return 0;
>
> + /*
> + * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
> + * wide and Icelake has up to eight subslices, specfial programming is
> + * needed in order to correctly enable all subslices.
> + *
> + * According to documentation software must consider the configuration
> + * as 2x4x8 and hardware will translate this to 1x8x8.
> + *
> + * Furthemore, even though SScount is three bits, maximum documented
> + * value for it is four. From this some rules/restrictions follow:
> + *
> + * 1.
> + * If enabled subslice count is greater than four, two whole slices must
> + * be enabled instead.
> + *
> + * 2.
> + * When more than one slice is enabled, hardware ignores the subslice
> + * count altogether.
> + *
> + * From these restrictions it follows that it is not possible to enable
> + * a count of subslices between the SScount maximum of four restriction,
> + * and the maximum available number on a particular SKU. Either all
> + * subslices are enabled, or a count between one and four on the first
> + * slice.
> + */
> + if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) {
> + GEM_BUG_ON(subslices & 1);
> +
> + subslice_pg = false;
Err... Now I'm looking at the documentation again and I see this for the
subslice enable field :
Enable Subslice Count Request.
0 = Use Async subslice count
1 = Use SScount in this register
Searching for "Async subslice count" leads me to another register (BSpec
9497).
Really confused as to what this does :|
Is the hardware then reading for that second register if you set
SSCountEn to 0?
We could put max(hweight8(sseu.subslice_mask[0]), 4) in SSCount, but
it's unclear what's right...
If you feel like this needs to be investigated, go ahead.
Otherwise I have a minor style suggestion below, either way :
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> + slices *= 2;
> + }
> +
> /*
> * Starting in Gen9, render power gating can leave
> * slice/subslice/EU in a partially enabled state. We
> @@ -2496,24 +2531,52 @@ make_rpcs(struct drm_i915_private *dev_priv)
> * enablement.
> */
> if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
> - rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
> - GEN8_RPCS_S_CNT_SHIFT;
> - rpcs |= GEN8_RPCS_ENABLE;
I would use a u32 val; like you did for the subslice part below and just
OR things together.
Just feels a bit more readable, but up to you.
> + u32 mask;
> +
> + rpcs = slices;
> +
> + if (INTEL_GEN(dev_priv) >= 11) {
> + mask = GEN11_RPCS_S_CNT_MASK;
> + rpcs <<= GEN11_RPCS_S_CNT_SHIFT;
> + } else {
> + mask = GEN8_RPCS_S_CNT_MASK;
> + rpcs <<= GEN8_RPCS_S_CNT_SHIFT;
> + }
> +
> + GEM_BUG_ON(rpcs & ~mask);
> + rpcs &= mask;
> +
> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_S_CNT_ENABLE;
> }
>
> - if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
> - rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> - rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
> - GEN8_RPCS_SS_CNT_SHIFT;
> - rpcs |= GEN8_RPCS_ENABLE;
> + if (subslice_pg) {
> + u32 val = subslices;
> +
> + val <<= GEN8_RPCS_SS_CNT_SHIFT;
> +
> + GEM_BUG_ON(val & ~GEN8_RPCS_SS_CNT_MASK);
> + val &= GEN8_RPCS_SS_CNT_MASK;
> +
> + rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_SS_CNT_ENABLE | val;
> }
>
> if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
> - rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> - GEN8_RPCS_EU_MIN_SHIFT;
> - rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> - GEN8_RPCS_EU_MAX_SHIFT;
> + u32 val;
> +
> + val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> + GEN8_RPCS_EU_MIN_SHIFT;
> + GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
> + val &= GEN8_RPCS_EU_MIN_MASK;
> +
> + rpcs |= val;
> +
> + val = INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
> + GEN8_RPCS_EU_MAX_SHIFT;
> + GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
> + val &= GEN8_RPCS_EU_MAX_MASK;
> +
> + rpcs |= val;
> +
> rpcs |= GEN8_RPCS_ENABLE;
> }
>
More information about the Intel-gfx
mailing list