[PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
Andi Shyti
andi.shyti at linux.intel.com
Wed Feb 21 00:12:18 UTC 2024
Hi Matt,
thanks a lot for looking into this.
On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
[...]
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 833987015b8b..7041acc77810 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> > if (engine->uabi_class == I915_NO_UABI_CLASS)
> > continue;
> >
> > + /*
> > + * Do not list and do not count CCS engines other than the first
> > + */
> > + if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > + engine->uabi_instance > 0) {
> > + i915->engine_uabi_class_count[engine->uabi_class]--;
> > + continue;
> > + }
>
> Wouldn't it be simpler to just add a workaround to the end of
> engine_mask_apply_compute_fuses() if we want to ensure only a single
> compute engine gets exposed? Then both the driver internals and uapi
> will agree that's there's just one CCS (and on which one there is).
>
> If we want to do something fancy with "hotplugging" a new engine later
> on or whatever, that can be handled in the future series (although as
> noted on the previous patch, it sounds like these changes might not
> actually be aligned with the workaround we were trying to address).
The hotplugging capability is one of the features I was looking
for, actually.
I have done some more refactoring in this piece of code in
upcoming patches.
Will check, though, if I can do something with compute_fuses(),
even though, the other cslices are not really fused off (read
below).
> > +
> > rb_link_node(&engine->uabi_node, prev, p);
> > rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index a425db5ed3a2..e19df4ef47f6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > }
> > }
> >
> > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > +{
> > + if (!IS_DG2(gt->i915))
> > + return;
> > +
> > + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
>
> This doesn't look right to me. A value of 0 means every cslice gets
> associated with CCS0.
Yes, that's what I'm trying to do. The behavior I'm looking for
is this one:
/*
...
* With 1 engine (ccs0):
* slice 0, 1, 2, 3: ccs0
*
* With 2 engines (ccs0, ccs1):
* slice 0, 2: ccs0
* slice 1, 3: ccs1
*
* With 4 engines (ccs0, ccs1, ccs2, ccs3):
* slice 0: ccs0
* slice 1: ccs1
* slice 2: ccs2
* slice 3: ccs3
...
*/
where the user can configure runtime the mode, making sure that
no client is connected to i915.
But, this needs to be written
As we are now forcing mode '1', then all cslices are connected
with ccs0.
> On a DG2-G11 platform, that will flat out break
> compute since CCS0 is never present (G11 only has a single CCS and it's
> always the hardware's CCS1). Even on a G10 or G12 this could also break
> things depending on the fusing of your card if the hardware CCS0 happens
> to be missing.
>
> Also, the register says that we need a field value of 0x7 for each
> cslice that's fused off. By passing 0, we're telling the CCS engine
> that it can use cslices that may not actually exist.
does it? Or do I need to write the id (0x0-0x3) of the user
engine? That's how the mode is calculated in this algorithm.
> > +}
> > +
[...]
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index cf709f6c05ae..c148113770ea 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1605,6 +1605,8 @@
> > #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0)
> > #define GEN12_CAGF_MASK REG_GENMASK(19, 11)
> >
> > +#define XEHP_CCS_MODE _MMIO(0x14804)
>
> Nitpick: this doesn't seem to be in the proper place and also breaks
> the file's convention of using tabs to move over to column 48 for the
> definition value.
This was something I actually forgot to fix. Thanks!
More information about the Intel-gfx
mailing list