[PATCH 1/4] drm/xe: Fix GT "for each engine" workarounds
Matt Roper
matthew.d.roper at intel.com
Thu Feb 27 19:03:35 UTC 2025
On Thu, Feb 27, 2025 at 12:38:18PM -0600, Lucas De Marchi wrote:
> On Thu, Feb 27, 2025 at 09:35:01AM -0800, Matt Roper wrote:
> > On Wed, Feb 26, 2025 at 10:47:52PM -0600, Lucas De Marchi wrote:
> > > On Wed, Feb 26, 2025 at 01:59:44PM -0800, Matt Roper wrote:
> > > > On Wed, Feb 26, 2025 at 11:20:16AM -0600, Lucas De Marchi wrote:
> > > > > On Wed, Feb 26, 2025 at 11:21:03AM +0000, Tvrtko Ursulin wrote:
> > > > > > Any rules using engine matching are currently broken due RTP processing
> > > > > > happening too in early init, before the list of hardware engines has been
> > > > > > initialised.
> > > > > >
> > > > > > Fix this by moving workaround processing to later in the driver probe
> > > > > > sequence, to just before the processed list is used for the first time.
> > > > > >
> > > > > > Looking at the debugfs gt0/workarounds on ADL-P we notice 14011060649
> > > > > > should be present while we see, before:
> > > > > >
> > > > > > GT Workarounds
> > > > > > 14011059788
> > > > > > 14015795083
> > > > > >
> > > > > > And with the patch:
> > > > > >
> > > > > > GT Workarounds
> > > > > > 14011060649
> > > > > > 14011059788
> > > > > > 14015795083
> > > > > >
> > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > > > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > >
> > > > > Cc: <stable at vger.kernel.org> # v6.11+
> > > > > Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > >
> > > > > The version in Cc stable is because it's what matters for officially
> > > > > supported HW by xe.
> > > > >
> > > > > We've been using the reset domain to decide if a WA is an engine vs gt
> > > > > workaround. Although I'm thinking if it wouldn't be better to simply
> > > > > drop the FOREACH_ENGINE processing and move those workarounds to be
> > > > > engine workarounds. Matt, any thoughts?
> > > >
> > > > That's a common mistake we've made in the past --- it doesn't work. If
> > > > you try to put a bunch of these registers onto the GuC's save/restore
> > > > list as happens with engine workarounds, the GuC will throw an assertion
> > > > and fail to load because it considers them to be outside the very narrow
> > > > set of register ranges it's allowed to work with.
> > >
> > > but that's a GuC issue that we would try to fix? at least for new
> > > platforms. if the register is in the mmio space for
> > > that engine, why would GuC do that?
> >
> > No, that's intentional behavior. Just because something is inside a
> > register's MMIO space doesn't mean the GuC should be re-programming it
> > on resets. The proper behavior for re-applying workarounds really comes
> > down to the register's reset behavior (whether it's wiped by resets or
> > not) and that's based on the sub-unit of hardware it lives inside, not
> > where it falls in the MMIO space. If the KMD asks the GuC to reprogram
> > something that it shouldn't be, that's considered a KMD bug and the
> > GuC's assertion is there to catch that. It's a safety net to make sure
> > the GuC isn't accidentally touching parts of hardware it has no need to.
> > I might be misremembering, but I seem to recall that there may even be
> > some hardware enforcement to prevent the GuC microcontroller from
> > touching such register ranges on certain platforms.
>
> I have to disagree here. It's absolutely necessary that it be *at least*
> at the reset domain level. You can't add something to be programmed on
> gt reset if the register is reset together with the engine. However the
> opposite is not true. For me it make makes total sense to add some
> registers there even if the current hardware happens to preserve them
> across engine resets.
These are two different things --- conceptually you can restore a
register more often than is necessary due to the hardware's reset
characteristics. But that's completely independent of the security
policy about which subset of registers the GuC microcontroller is
allowed to touch. The platform architects decided that the GuC can only
touch specifically whitelisted-ranges of registers (this is a different
whitelist than the UMD whitelist we control with the driver). That
security policy is enforced by the GuC firmware (and possibly by the
hardware as well on some platforms, although I'm not positive about
that).
If we had a way of saving/restoring these registers without using the
GuC, then sure it would be fine to stick them on the engine reset list.
But since the KMD doesn't have visibility of engine resets at the proper
time and only the GuC scheduler is in charge of that, only the GuC
firmware can save/restore those registers. And by extension, that means
that we shouldn't be putting any registers on that list that the GuC is
not allowed to touch.
Matt
>
> Lucas De Marchi
>
> >
> >
> > Matt
> >
> > >
> > > Lucas De Marchi
> > >
> > > >
> > > > So no, unfortunately these do need to remain "GT" workarounds, even
> > > > though they need to be applied to registers that have per-engine
> > > > instances. :-(
> > > >
> > > >
> > > > Matt
> > > >
> > > > >
> > > > > thanks
> > > > > Lucas De Marchi
> > > > >
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_gt.c | 4 ++--
> > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > index 650a0ee56e97..d59c03bc05b7 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > @@ -361,9 +361,7 @@ int xe_gt_init_early(struct xe_gt *gt)
> > > > > > if (err)
> > > > > > return err;
> > > > > >
> > > > > > - xe_wa_process_gt(gt);
> > > > > > xe_wa_process_oob(gt);
> > > > > > - xe_tuning_process_gt(gt);
> > > > > >
> > > > > > xe_force_wake_init_gt(gt, gt_to_fw(gt));
> > > > > > spin_lock_init(>->global_invl_lock);
> > > > > > @@ -450,6 +448,8 @@ static int all_fw_domain_init(struct xe_gt *gt)
> > > > > > }
> > > > > >
> > > > > > xe_gt_mcr_set_implicit_defaults(gt);
> > > > > > + xe_wa_process_gt(gt);
> > > > > > + xe_tuning_process_gt(gt);
> > > > > > xe_reg_sr_apply_mmio(>->reg_sr, gt);
> > > > > >
> > > > > > err = xe_gt_clock_init(gt);
> > > > > > --
> > > > > > 2.48.0
> > > > > >
> > > >
> > > > --
> > > > Matt Roper
> > > > Graphics Software Engineer
> > > > Linux GPU Platform Enablement
> > > > Intel Corporation
> >
> > --
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
More information about the Intel-xe
mailing list