[Intel-xe] [PATCH] drm/xe: Move wa processing to gt_init

Matthew Brost matthew.brost at intel.com
Tue Nov 28 13:55:40 UTC 2023


On Tue, Nov 28, 2023 at 11:45:36AM -0600, Lucas De Marchi wrote:
> On Tue, Nov 28, 2023 at 08:47:59AM +0000, Matthew Brost wrote:
> > On Tue, Nov 21, 2023 at 08:48:47AM -0600, Lucas De Marchi wrote:
> > > On Tue, Nov 21, 2023 at 04:55:55PM +0530, Tejas Upadhyay wrote:
> > > > Currently WA processing for gt is happening in early init API.
> > > > It also includes processing of WAs which needs filter based on
> > > > hardware engine by looping over number of hardware engines in gt.
> > > >
> > > > However at early init stage hardware engines are not associated
> > > > with gt in software which skips processing such WAs.
> > > >
> > > > Example WAs which will be fixed to apply after this change,
> > > > "14011060649",
> > > > "16010515920",
> > > > "16021867713"
> > > >
> > > > Fixes: 811a9b3d262d8 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_gt.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > index 0dddb751c6a4..3d0f7b6f952c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > @@ -311,7 +311,6 @@ int xe_gt_init_early(struct xe_gt *gt)
> > > > 	if (err)
> > > > 		return err;
> > > >
> > > > -	xe_wa_process_gt(gt);
> > > > 	xe_wa_process_oob(gt);
> > > > 	xe_tuning_process_gt(gt);
> > > >
> > > > @@ -495,6 +494,7 @@ int xe_gt_init(struct xe_gt *gt)
> > > > 	if (err)
> > > > 		return err;
> > > >
> > > > +	xe_wa_process_gt(gt);
> > > 
> > > the problem with that is that if we need to touch any register already
> > > touched until this point, the workarounds would not had been applied.
> > > Digging the git history, I had added this commit exactly to be able to
> > > initialize the gt workarounds as soon as possible:
> > > 
> > > 	4eb2e8d25ab4 ("drm/xe: Add xe_hw_engines_init_early()")
> > > 
> > > And then
> > > 
> > > 	816d37907ac6 ("drm/xe: Plug gt WAs to gt init/reset")
> > > 
> > > Note that these commits are in the xe branch, not drm-xe-next.
> > > So, it was:
> > > 
> > > 	xe_hw_engines_init_early(gt);
> > 
> > xe_hw_engines_init_early has be called after xe_uc_init_hwconfig after
> 
> did you mean "has to be" or "has been"?
> 

Sorry, "has to be".

> > the idea is gt->info.engine_mask would be set by reading the hwconfig
> > table from the GuC rather than from a hardcoded table in the driver.
> 
> like I said, the problem with postponing that is we could have hard to
> debug issues because we will think we have all the workarounds coded,
> not realizing it isn't applied to the early stages of hw initialization.
> 
> I think we should go back push back on the idea of initializing the hw
> engines array after loading guc... it doesn't make much sense from the
> engine initialization pov and we are *not* doing that.
> 
> static int gt_fw_domain_init(struct xe_gt *gt)
> {
> 	...
> 	err = xe_uc_init_hwconfig(&gt->uc);                                   	if
> (err)                                                              		goto
> err_force_wake;
> 
> 	xe_gt_idle_sysfs_init(&gt->gtidle);
> 
> 	/* XXX: Fake that we pull the engine mask from hwconfig blob */
> 	gt->info.engine_mask = gt->info.__engine_mask;

Right, we don't currently parse the hwconfig to get the engine_mask but
eventually they should be.

> 
> 	...
> }
> 
> The engines available are not hardcoded in the driver. We read the fuses

The initial value of info.engine_mask is hardcoded in the driver, the
fuses adjust this. Eventual I think we want to get rid of all platform
information hardcoded in the driver and dervive info.engine_mask from
the hwconfig. At least that is the goal from my understanding.

> to know the avilable ones. That is sufficient already to loop through all
> of them. Note that since it's *engine* WAs, we need to pass those to GuC
> via the ads since it's GuC itself that does the save and restore.
> 

Yes, but not for hwconfig load. We can load the GuC to read hwconfig and
then properly populate engine WAs list when the GuC is loaded for
submission.

> > Likewise for_each_hw_engine should only be called after
> > xe_hw_engines_init_early is called. With that, I think this patch is
> > correct. Does that make sense?
> 
> as per above, no. I think we are overcomplicating it for no real
> benefit.
>

I agree this is complicated but I thought this ordering / reading the
platform information from hwconfig was requirement. Maybe we should
follow up to check if this is indeed a requirement.

> > 
> > I suggested to Tejas that we add an assert to for_each_hw_engine to
> > enforce this is called after xe_hw_engines_init_early to catch issues
> > like this too.
> 
> I think that is already clear. Not sure it deserves an assert. The
> assert would for instance trigger right now in calculate_regset_size(),
> but that is harmless. Also we'd have it spread through the driver.
> 

Well an assert like this would caught this bug quite a bit early.

Matt

> Lucas De Marchi
> 
> > 
> > > 	...
> > > 	xe_wa_process_gt();
> > > 
> > > That gt_fw_domain_init() seems a very poor place to initialize these
> > > things now. It started with commit b892e30c6cd7 ("drm/xe: Move load of
> > > minimal GuC and hwconfig read much earlier") and then people end up
> > > adding random stuff in that function :(
> > > 
> > > So I think we need to get back to the original idea of _early()
> > > initializing what's needed. Then we'd probably need a few cleanups in
> > > the initialization sequence so the function names at least partially
> > > matches what's being initialized.
> > > 
> > 
> > I agree on that the naming of all of this could be better...
> > 
> > Matt
> > 
> > > Lucas De Marchi
> > > 
> > > > 	xe_force_wake_init_engines(gt, gt_to_fw(gt));
> > > >
> > > > 	err = all_fw_domain_init(gt);
> > > > --
> > > > 2.25.1
> > > >


More information about the Intel-xe mailing list