[PATCH 1/4] drm/xe: Fix GT "for each engine" workarounds

Matt Roper matthew.d.roper at intel.com
Thu Feb 27 17:35:01 UTC 2025


On Wed, Feb 26, 2025 at 10:47:52PM -0600, Lucas De Marchi wrote:
> On Wed, Feb 26, 2025 at 01:59:44PM -0800, Matt Roper wrote:
> > On Wed, Feb 26, 2025 at 11:20:16AM -0600, Lucas De Marchi wrote:
> > > On Wed, Feb 26, 2025 at 11:21:03AM +0000, Tvrtko Ursulin wrote:
> > > > Any rules using engine matching are currently broken due RTP processing
> > > > happening too in early init, before the list of hardware engines has been
> > > > initialised.
> > > >
> > > > Fix this by moving workaround processing to later in the driver probe
> > > > sequence, to just before the processed list is used for the first time.
> > > >
> > > > Looking at the debugfs gt0/workarounds on ADL-P we notice 14011060649
> > > > should be present while we see, before:
> > > >
> > > > GT Workarounds
> > > >     14011059788
> > > >     14015795083
> > > >
> > > > And with the patch:
> > > >
> > > > GT Workarounds
> > > >     14011060649
> > > >     14011059788
> > > >     14015795083
> > > >
> > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > 
> > > Cc: <stable at vger.kernel.org> # v6.11+
> > > Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > 
> > > The version in Cc stable is because it's what matters for officially
> > > supported HW by xe.
> > > 
> > > We've been using the reset domain to decide if a WA is an engine vs gt
> > > workaround. Although I'm thinking if it wouldn't be better to simply
> > > drop the FOREACH_ENGINE processing and move those workarounds to be
> > > engine workarounds. Matt, any thoughts?
> > 
> > That's a common mistake we've made in the past --- it doesn't work.  If
> > you try to put a bunch of these registers onto the GuC's save/restore
> > list as happens with engine workarounds, the GuC will throw an assertion
> > and fail to load because it considers them to be outside the very narrow
> > set of register ranges it's allowed to work with.
> 
> but that's a GuC issue that we would try to fix? at least for new
> platforms. if the register is in the mmio space for
> that engine, why would GuC do that?

No, that's intentional behavior.  Just because something is inside a
register's MMIO space doesn't mean the GuC should be re-programming it
on resets.  The proper behavior for re-applying workarounds really comes
down to the register's reset behavior (whether it's wiped by resets or
not) and that's based on the sub-unit of hardware it lives inside, not
where it falls in the MMIO space.  If the KMD asks the GuC to reprogram
something that it shouldn't be, that's considered a KMD bug and the
GuC's assertion is there to catch that.  It's a safety net to make sure
the GuC isn't accidentally touching parts of hardware it has no need to.
I might be misremembering, but I seem to recall that there may even be
some hardware enforcement to prevent the GuC microcontroller from
touching such register ranges on certain platforms.


Matt

> 
> Lucas De Marchi
> 
> > 
> > So no, unfortunately these do need to remain "GT" workarounds, even
> > though they need to be applied to registers that have per-engine
> > instances.  :-(
> > 
> > 
> > Matt
> > 
> > > 
> > > thanks
> > > Lucas De Marchi
> > > 
> > > > ---
> > > > drivers/gpu/drm/xe/xe_gt.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > index 650a0ee56e97..d59c03bc05b7 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > @@ -361,9 +361,7 @@ int xe_gt_init_early(struct xe_gt *gt)
> > > > 	if (err)
> > > > 		return err;
> > > >
> > > > -	xe_wa_process_gt(gt);
> > > > 	xe_wa_process_oob(gt);
> > > > -	xe_tuning_process_gt(gt);
> > > >
> > > > 	xe_force_wake_init_gt(gt, gt_to_fw(gt));
> > > > 	spin_lock_init(&gt->global_invl_lock);
> > > > @@ -450,6 +448,8 @@ static int all_fw_domain_init(struct xe_gt *gt)
> > > > 	}
> > > >
> > > > 	xe_gt_mcr_set_implicit_defaults(gt);
> > > > +	xe_wa_process_gt(gt);
> > > > +	xe_tuning_process_gt(gt);
> > > > 	xe_reg_sr_apply_mmio(&gt->reg_sr, gt);
> > > >
> > > > 	err = xe_gt_clock_init(gt);
> > > > --
> > > > 2.48.0
> > > >
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list