[PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload
Matt Roper
matthew.d.roper at intel.com
Thu Feb 22 22:33:27 UTC 2024
On Thu, Feb 22, 2024 at 11:03:27PM +0100, Andi Shyti wrote:
> Hi Matt,
>
> first of all thanks a lot for the observations you are raising.
>
> On Wed, Feb 21, 2024 at 12:51:04PM -0800, Matt Roper wrote:
> > On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> > > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
>
> ...
>
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > index a425db5ed3a2..e19df4ef47f6 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > > > > }
> > > > > }
> > > > >
> > > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > > > +{
> > > > > + if (!IS_DG2(gt->i915))
> > > > > + return;
> > > > > +
> > > > > + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > > >
> > > > This doesn't look right to me. A value of 0 means every cslice gets
> > > > associated with CCS0.
> > >
> > > Yes, that's what I'm trying to do. The behavior I'm looking for
> > > is this one:
> > >
> > > /*
> > > ...
> > > * With 1 engine (ccs0):
> > > * slice 0, 1, 2, 3: ccs0
> > > *
> > > * With 2 engines (ccs0, ccs1):
> > > * slice 0, 2: ccs0
> > > * slice 1, 3: ccs1
> > > *
> > > * With 4 engines (ccs0, ccs1, ccs2, ccs3):
> > > * slice 0: ccs0
> > > * slice 1: ccs1
> > > * slice 2: ccs2
> > > * slice 3: ccs3
> > > ...
> > > */
> > >
> > > where the user can configure runtime the mode, making sure that
> > > no client is connected to i915.
> > >
> > > But, this needs to be written
> > >
> > > As we are now forcing mode '1', then all cslices are connected
> > > with ccs0.
> >
> > Right --- and that's what I'm pointing out as illegal. I think that
> > code comment above was taken out of context from a different RFC series;
> > that's not an accurate description of the behavior we want here.
> >
> > First, the above comment is using ccs# to refer to userspace engines,
> > not hardware engines. As a simple example, DG2-G11 only ever has a
> > single CCS which userspace sees as "instance 0" but which is actually
> > CCS1 at the hardware level. If you try to follow the comment above when
> > programming CCS_MODE, you've assigned all of the cslices to a
> > non-existent engine and assigned no cslices to the CCS engine that
> > actually exists. For DG2-G10 (and I think DG2-G12), there are different
> > combinations of fused-off / not-fused-off engines that will always show
> > up in userspace as CCS0-CCSn, even if those don't match the hardware
> > IDs.
> >
> > Second, the above comment is assuming that you have a part with a
> > maximum fusing config (i.e., all cslices present). Using DG2-G11 again
> > as an example, there's also only a single cslice (cslice1), so if you
> > tell CCS1 that it's allowed to use EUs from non-existent cslice0,
> > cslice2, and cslice3, you might not get the behavior you were hoping
> > for.
>
> if the hardware slices are fused off we wouldn't see them in a
> first place, right? And that's anyway a permanent configuration
> that wouldn't affect the patch.
There are physically four possible cslices in the IP design. The
presence/absence of each of those cslices can vary both by SKU and by
part-specific fusing. Some SKUs (DG2-G11) wind up only ever having a
single possible configuration as far as I know, but the larger SKUs have
more part-to-part variation in terms of exactly which specific subset of
DSS (and by extension cslices) are present/absent. The KMD determines
the configuration at runtime by reading the DSS fuse registers and
deriving the cslice presence/absence from that.
The register you're writing in this patch tells the CCS engine which
cslice(s) it can use to execute work. If the KMD already knows that
cslice<x> doesn't exist, but it tells CCS<y> that it can go ahead and
use it anyway, things probably won't work properly. That's why the spec
mandates that we always program 0x7 in the register for any cslices that
we know aren't present so that none of the CCS engines will incorrectly
try to utilize them. If we ignore that rule, then it's a driver bug.
>
> BTW, is there any machine I can test this scenario?
There should differently-fused DG2 systems in our internal pool,
although I'm not sure what the breakdown is. I don't think the
reservation system makes the low-level cslice configuration immediately
obvious on the summary page, so you might just need to log into a few
systems and read the fuse registers to check which ones are best for
testing these cases.
>
> > > > On a DG2-G11 platform, that will flat out break
> > > > compute since CCS0 is never present (G11 only has a single CCS and it's
> > > > always the hardware's CCS1). Even on a G10 or G12 this could also break
> > > > things depending on the fusing of your card if the hardware CCS0 happens
> > > > to be missing.
> > > >
> > > > Also, the register says that we need a field value of 0x7 for each
> > > > cslice that's fused off. By passing 0, we're telling the CCS engine
> > > > that it can use cslices that may not actually exist.
> > >
> > > does it? Or do I need to write the id (0x0-0x3) of the user
> > > engine? That's how the mode is calculated in this algorithm.
> >
> > 0x0 - 0x3 are how you specify that a specific CCS engine can use the
> > cslice. If the cslice can't be used at all (i.e., it's fused off), then
> > you need to program a 0x7 to ensure no engines try to use the
> > non-existent DSS/EUs.
>
> I planned to limit this to the only DG2 (and ATSM, of course).
> Do you think it would it help?
Yes, definitely. It's mandatory programming for these platforms.
Matt
>
> Andi
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
More information about the Intel-gfx
mailing list