[PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS
Andi Shyti
andi.shyti at linux.intel.com
Thu Mar 7 20:02:17 UTC 2024
Hi Matt,
On Wed, Mar 06, 2024 at 03:46:09PM -0800, Matt Roper wrote:
> On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote:
> > The hardware should not dynamically balance the load between CCS
> > engines. Wa_14019159160 recommends disabling it across all
> > platforms.
> >
> > Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> > Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
> > Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Matt Roper <matthew.d.roper at intel.com>
> > Cc: <stable at vger.kernel.org> # v6.2+
> > ---
> > drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
> > drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
> > 2 files changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index 50962cfd1353..cf709f6c05ae 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1478,6 +1478,7 @@
> >
> > #define GEN12_RCU_MODE _MMIO(0x14800)
> > #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
> > +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
> >
> > #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168)
> > #define CHV_FGT_DISABLE_SS0 (1 << 10)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index d67d44611c28..a2e78cf0b5f5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
> >
> > /* Wa_18028616096 */
> > wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3);
> > +
> > + /*
> > + * Wa_14019159160: disable the automatic CCS load balancing
>
> I'm still a bit concerned that this doesn't really match what this
> specific workaround is asking us to do. There seems to be an agreement
> on various internal email threads that we need to disable load
> balancing, but there's no single specific workaround that officially
> documents that decision.
>
> This specific workaround asks us to do a bunch of different things, and
> the third item it asks for is to disable load balancing in very specific
> cases (i.e., while the RCS is active at the same time as one or more CCS
> engines). Taking this workaround in isolation, it would be valid to
> keep load balancing active if you were just using the CCS engines and
> leaving the RCS idle, or if balancing was turned on/off by the GuC
> scheduler according to engine use at the moment, as the documented
> workaround seems to assume will be the case.
>
> So in general I think we do need to disable load balancing based on
> other offline discussion, but blaming that entire change on
> Wa_14019159160 seems a bit questionable since it's not really what this
> specific workaround is asking us to do and someone may come back and try
> to "correct" the implementation of this workaround in the future without
> realizing there are other factors too. It would be great if we could
> get hardware teams to properly document this expectation somewhere
> (either in a separate dedicated workaround, or in the MMIO tuning guide)
> so that we'll have a more direct and authoritative source for such a
> large behavioral change.
On one had I think you are right, on the other hand I think this
workaround has not properly developed in what we have been
describing later.
Perhaps, one solution would be to create a new generic workaround
for all platforms with more than one CCS and put everyone at
peace. But I don't know the process.
Are you able to help here? Or Joonas?
Thanks, Matt!
Andi
More information about the Intel-gfx
mailing list