[Intel-xe] [PATCH] drm/xe: Move wa processing to gt_init

Thu Nov 30 00:57:20 UTC 2023

On Tue, Nov 28, 2023 at 07:07:39PM -0600, Lucas De Marchi wrote:
> On Tue, Nov 28, 2023 at 01:55:40PM +0000, Matthew Brost wrote:
> > On Tue, Nov 28, 2023 at 11:45:36AM -0600, Lucas De Marchi wrote:
> > > On Tue, Nov 28, 2023 at 08:47:59AM +0000, Matthew Brost wrote:
> > > > On Tue, Nov 21, 2023 at 08:48:47AM -0600, Lucas De Marchi wrote:
> > > > > On Tue, Nov 21, 2023 at 04:55:55PM +0530, Tejas Upadhyay wrote:
> > > > > > Currently WA processing for gt is happening in early init API.
> > > > > > It also includes processing of WAs which needs filter based on
> > > > > > hardware engine by looping over number of hardware engines in gt.
> > > > > >
> > > > > > However at early init stage hardware engines are not associated
> > > > > > with gt in software which skips processing such WAs.
> > > > > >
> > > > > > Example WAs which will be fixed to apply after this change,
> > > > > > "14011060649",
> > > > > > "16010515920",
> > > > > > "16021867713"
> > > > > >
> > > > > > Fixes: 811a9b3d262d8 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_gt.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > index 0dddb751c6a4..3d0f7b6f952c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > @@ -311,7 +311,6 @@ int xe_gt_init_early(struct xe_gt *gt)
> > > > > > 	if (err)
> > > > > > 		return err;
> > > > > >
> > > > > > -	xe_wa_process_gt(gt);
> > > > > > 	xe_wa_process_oob(gt);
> > > > > > 	xe_tuning_process_gt(gt);
> > > > > >
> > > > > > @@ -495,6 +494,7 @@ int xe_gt_init(struct xe_gt *gt)
> > > > > > 	if (err)
> > > > > > 		return err;
> > > > > >
> > > > > > +	xe_wa_process_gt(gt);
> > > > >
> > > > > the problem with that is that if we need to touch any register already
> > > > > touched until this point, the workarounds would not had been applied.
> > > > > Digging the git history, I had added this commit exactly to be able to
> > > > > initialize the gt workarounds as soon as possible:
> > > > >
> > > > > 	4eb2e8d25ab4 ("drm/xe: Add xe_hw_engines_init_early()")
> > > > >
> > > > > And then
> > > > >
> > > > > 	816d37907ac6 ("drm/xe: Plug gt WAs to gt init/reset")
> > > > >
> > > > > Note that these commits are in the xe branch, not drm-xe-next.
> > > > > So, it was:
> > > > >
> > > > > 	xe_hw_engines_init_early(gt);
> > > >
> > > > xe_hw_engines_init_early has be called after xe_uc_init_hwconfig after
> > > 
> > > did you mean "has to be" or "has been"?
> > > 
> > 
> > Sorry, "has to be".
> > 
> > > > the idea is gt->info.engine_mask would be set by reading the hwconfig
> > > > table from the GuC rather than from a hardcoded table in the driver.
> > > 
> > > like I said, the problem with postponing that is we could have hard to
> > > debug issues because we will think we have all the workarounds coded,
> > > not realizing it isn't applied to the early stages of hw initialization.
> > > 
> > > I think we should go back push back on the idea of initializing the hw
> > > engines array after loading guc... it doesn't make much sense from the
> > > engine initialization pov and we are *not* doing that.
> > > 
> > > static int gt_fw_domain_init(struct xe_gt *gt)
> > > {
> > > 	...
> > > 	err = xe_uc_init_hwconfig(&gt->uc);                                   	if
> > > (err)                                                              		goto
> > > err_force_wake;
> > > 
> > > 	xe_gt_idle_sysfs_init(&gt->gtidle);
> > > 
> > > 	/* XXX: Fake that we pull the engine mask from hwconfig blob */
> > > 	gt->info.engine_mask = gt->info.__engine_mask;
> > 
> > Right, we don't currently parse the hwconfig to get the engine_mask but
> > eventually they should be.
> > 
> > > 
> > > 	...
> > > }
> > > 
> > > The engines available are not hardcoded in the driver. We read the fuses
> > 
> > The initial value of info.engine_mask is hardcoded in the driver, the
> > fuses adjust this. Eventual I think we want to get rid of all platform
> 
> but it doesn't matter much. Even if we were to read from the hwconfig,
> we would still have gt->info.engine_mask with the upper limit and then
> use the hwconfig to trim it down. How different is it from doing the
> same thing, but with the fuses? We don't do that today and keep a trim
> down engine list, but equally we don't use the hwconfig on
> this part of the initialization. The hwconfig could be useful for
> something else, but I don't see how it helps here.
> 
> > information hardcoded in the driver and dervive info.engine_mask from
> > the hwconfig. At least that is the goal from my understanding.

I think this has been mentioned in the past, but there's no expectation
that we'll ever be able to read engine masks from hwconfig.  In general
the hwconfig is never supposed to provide device fusing details.  You
_might_ be able to get counts (e.g., "this SKU should have 3 BCS
engines") but it will never tell you which specific engine instances are
the ones that are available.  That information is always expected to
come from the fuse registers instead.  The same is true for other
hardware concepts like DSS, EUs, L3 banks, etc.

The engine masks we define in the device info are the set of engines
that might ever show up on the platform.  The full set of engines might
never be available on any SKU of the platform, but the constant engine
list serves as the starting point which you then pare down by reading
the appropriate fuse registers.

Matt

> > 
> > > to know the avilable ones. That is sufficient already to loop through all
> > > of them. Note that since it's *engine* WAs, we need to pass those to GuC
> > > via the ads since it's GuC itself that does the save and restore.
> > > 
> > 
> > Yes, but not for hwconfig load. We can load the GuC to read hwconfig and
> > then properly populate engine WAs list when the GuC is loaded for
> > submission.
> > 
> > > > Likewise for_each_hw_engine should only be called after
> > > > xe_hw_engines_init_early is called. With that, I think this patch is
> > > > correct. Does that make sense?
> > > 
> > > as per above, no. I think we are overcomplicating it for no real
> > > benefit.
> > > 
> > 
> > I agree this is complicated but I thought this ordering / reading the
> > platform information from hwconfig was requirement. Maybe we should
> > follow up to check if this is indeed a requirement.
> 
> What happens when we need a gt workaround to be applied even before the
> first GuC load? This patch moves all the GT workarounds to happen after
> the first load so it potentially creates issues if any of the current WAs
> in gt_was[] are needed before that point.
> 
> The alternative, adding yet another WA type for when we need to decide
> on the WA by looking at the engines it has, doesn't look pretty and also
> complicates the decision on where to add new workarounds. When Tejas
> asked about the possible solutions and I mentioned making the early
> stuff happen early, I was not thinking about moving the WA processing
> after the hwconfig, I was thinking about initializing the hw engines
> array earlier like it was before.
> 
> I think the possible things to do are:
> 
> 1) move the engine mask to be initialized earlier like it was and stop
> pretending we are getting it from hwconfig
> 
> 2) what this patch does, moving the gt WAs processing after the first
> guc load
> 
> 3) split the gt_was[] - all the WAs that need to check the engine in the
> GT would need to be applied later
> 
> 4) ... something else?
> 
> I'd say my preference is (1), (3), (2).
> 
> > 
> > > >
> > > > I suggested to Tejas that we add an assert to for_each_hw_engine to
> > > > enforce this is called after xe_hw_engines_init_early to catch issues
> > > > like this too.
> > > 
> > > I think that is already clear. Not sure it deserves an assert. The
> > > assert would for instance trigger right now in calculate_regset_size(),
> > > but that is harmless. Also we'd have it spread through the driver.
> > > 
> > 
> > Well an assert like this would caught this bug quite a bit early.
> 
> ok, fair enough on this one.... as long as it's behind the debug
> kconfig, it should be fine.
> 
> > 
> > Matt
> > 
> > > Lucas De Marchi
> > > 
> > > >
> > > > > 	...
> > > > > 	xe_wa_process_gt();
> > > > >
> > > > > That gt_fw_domain_init() seems a very poor place to initialize these
> > > > > things now. It started with commit b892e30c6cd7 ("drm/xe: Move load of
> > > > > minimal GuC and hwconfig read much earlier") and then people end up
> > > > > adding random stuff in that function :(
> > > > >
> > > > > So I think we need to get back to the original idea of _early()
> > > > > initializing what's needed. Then we'd probably need a few cleanups in
> > > > > the initialization sequence so the function names at least partially
> > > > > matches what's being initialized.
> > > > >
> > > >
> > > > I agree on that the naming of all of this could be better...
> > > >
> > > > Matt
> > > >
> > > > > Lucas De Marchi
> > > > >
> > > > > > 	xe_force_wake_init_engines(gt, gt_to_fw(gt));
> > > > > >
> > > > > > 	err = all_fw_domain_init(gt);
> > > > > > --
> > > > > > 2.25.1
> > > > > >

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation