[Intel-gfx] [PATCH] Revert "drm/i915: Apply Wa_1406680159:icl, ehl as an engine workaround"

Thu Jun 10 16:43:14 UTC 2021

On Thu, Jun 10, 2021 at 03:16:58PM +0530, Tejas Upadhyay wrote:
> This reverts commit fb899dd8ea9c4ac5928b86946e6536790981adae.
> 
> w/a on mentioned platforms not working as expected and causing
> more harm on the RC6 flow.
> 
> Fixes: fb899dd8ea9c ("drm/i915: Apply Wa_1406680159:icl,ehl as an engine workaround")
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay at intel.com>

NAK.  This patch does not do what it claims (it deletes the workaround
completely rather than reverting the original patch) and also doesn't
address the real issue here.

As we've discussed before, this workaround is working properly.  We've
already confirmed that the register write does go through, the bit is
set in hardware, and the relevant clock gating is disabled at the
hardware level.  The *only* thing that isn't working is the readback
verification of the register bit, and this is a multicast steering issue
rather than anything to do with this specific workaround (due to "luck"
this just happens to be the only documented workaround that updates a
register in the affected multicast region on this platform, but you'd
see the same type of warnings if other workarounds in the future start
touching the same general multicast region).

When we write to a multicast register like this, all instances of the
register that live behind the same MMIO address get updated.  But when
we read back those multicast registers, the value we get back always
comes from one specific instance.  For workarounds, all of the instances
will have the same value, so it doesn't matter which instance we read
back, as long as we don't read back from an instance that is fused off
(if we do, we'll read back a 0).  During driver init, we pick a register
steering configuration that will make sure reads of multicase registers
hit a non-fused-off instance and return a proper value.

However on some platforms (including EHL/JSL), there's one more thing to
consider.  On these platforms, Render Power Gating (RPG) will modify the
behavior when the hardware exits RC6 --- instead of all registers being
powered back up and accessible when forcewake is grabbed, only a single
slice/subslice will be powered up initially if there's no workload to be
run (the hardware architects refer to this as the "minconfig").  This
means that when reading from multicast registers while RPG is enabled,
we need to not only steer toward a slice/subslice that isn't fused off,
but we also need to steer toward the specific slice/subslice that is
used as the minconfig (which I believe is always the lowest numbered
one).

At the point we apply and verify workarounds, RPG is not supposed to be
active (i.e., we're not at the point of GT setup where we (re)enable it
yet).  However there seems to be some hardware misbehavior where in
certain circumstances a previous write to the A210 control register to
disable Render Power Gating did not take effect --- the A210 register
value indicates that RPG should be off, but the hardware continues to
operate as if it is on, and only minconfig instances of MCR registers
get powered up to return valid values.  We encounter this problem during
certain reset or suspend/resume operations, and that's what causes the
workaround warning message here.  We're steering our read to a valid
(non-fused-off) instance of the multicast register, but since it isn't
the specific minconfig slice/subslice we still fail to read back the
proper value.

If we just want to fix the workaround warning message here, it should be
a simple matter of making sure our default steering always targets the
minconfig slice/subslice.  I believe switching wa_init_mcr() from using
fls() to ffs() would be sufficient --- we'd always pick the lowest
possible slice/subslice in that case (i.e., the minconfig) instead of
the highest.

The real question to be answered is why the hardware continues to behave
as if RPG is on, even after its been disabled via the POWERGATE_ENABLE
register.  This may be a hardware bug, or (more likely), there's some
step that wasn't properly documented in the bspec so our driver isn't
following it (the GT power management documentation in the bspec
definitely isn't as clear and complete as we'd like it to be).

Matt

> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index b62d1e31a645..fea7fde30d4a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -1774,11 +1774,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2,
>  			    PSDUNIT_CLKGATE_DIS);
>  
> -		/* Wa_1406680159:icl,ehl */
> -		wa_write_or(wal,
> -			    SUBSLICE_UNIT_LEVEL_CLKGATE,
> -			    GWUNIT_CLKGATE_DIS);
> -
>  		/*
>  		 * Wa_1408767742:icl[a2..forever],ehl[all]
>  		 * Wa_1605460711:icl[a0..c0]
> -- 
> 2.31.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795