[Intel-xe] [PATCH] drm/xe: Move wa processing to gt_init
Matthew Brost
matthew.brost at intel.com
Tue Nov 28 13:55:40 UTC 2023
On Tue, Nov 28, 2023 at 11:45:36AM -0600, Lucas De Marchi wrote:
> On Tue, Nov 28, 2023 at 08:47:59AM +0000, Matthew Brost wrote:
> > On Tue, Nov 21, 2023 at 08:48:47AM -0600, Lucas De Marchi wrote:
> > > On Tue, Nov 21, 2023 at 04:55:55PM +0530, Tejas Upadhyay wrote:
> > > > Currently WA processing for gt is happening in early init API.
> > > > It also includes processing of WAs which needs filter based on
> > > > hardware engine by looping over number of hardware engines in gt.
> > > >
> > > > However at early init stage hardware engines are not associated
> > > > with gt in software which skips processing such WAs.
> > > >
> > > > Example WAs which will be fixed to apply after this change,
> > > > "14011060649",
> > > > "16010515920",
> > > > "16021867713"
> > > >
> > > > Fixes: 811a9b3d262d8 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_gt.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > index 0dddb751c6a4..3d0f7b6f952c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > @@ -311,7 +311,6 @@ int xe_gt_init_early(struct xe_gt *gt)
> > > > if (err)
> > > > return err;
> > > >
> > > > - xe_wa_process_gt(gt);
> > > > xe_wa_process_oob(gt);
> > > > xe_tuning_process_gt(gt);
> > > >
> > > > @@ -495,6 +494,7 @@ int xe_gt_init(struct xe_gt *gt)
> > > > if (err)
> > > > return err;
> > > >
> > > > + xe_wa_process_gt(gt);
> > >
> > > the problem with that is that if we need to touch any register already
> > > touched until this point, the workarounds would not had been applied.
> > > Digging the git history, I had added this commit exactly to be able to
> > > initialize the gt workarounds as soon as possible:
> > >
> > > 4eb2e8d25ab4 ("drm/xe: Add xe_hw_engines_init_early()")
> > >
> > > And then
> > >
> > > 816d37907ac6 ("drm/xe: Plug gt WAs to gt init/reset")
> > >
> > > Note that these commits are in the xe branch, not drm-xe-next.
> > > So, it was:
> > >
> > > xe_hw_engines_init_early(gt);
> >
> > xe_hw_engines_init_early has be called after xe_uc_init_hwconfig after
>
> did you mean "has to be" or "has been"?
>
Sorry, "has to be".
> > the idea is gt->info.engine_mask would be set by reading the hwconfig
> > table from the GuC rather than from a hardcoded table in the driver.
>
> like I said, the problem with postponing that is we could have hard to
> debug issues because we will think we have all the workarounds coded,
> not realizing it isn't applied to the early stages of hw initialization.
>
> I think we should go back push back on the idea of initializing the hw
> engines array after loading guc... it doesn't make much sense from the
> engine initialization pov and we are *not* doing that.
>
> static int gt_fw_domain_init(struct xe_gt *gt)
> {
> ...
> err = xe_uc_init_hwconfig(>->uc); if
> (err) goto
> err_force_wake;
>
> xe_gt_idle_sysfs_init(>->gtidle);
>
> /* XXX: Fake that we pull the engine mask from hwconfig blob */
> gt->info.engine_mask = gt->info.__engine_mask;
Right, we don't currently parse the hwconfig to get the engine_mask but
eventually they should be.
>
> ...
> }
>
> The engines available are not hardcoded in the driver. We read the fuses
The initial value of info.engine_mask is hardcoded in the driver, the
fuses adjust this. Eventual I think we want to get rid of all platform
information hardcoded in the driver and dervive info.engine_mask from
the hwconfig. At least that is the goal from my understanding.
> to know the avilable ones. That is sufficient already to loop through all
> of them. Note that since it's *engine* WAs, we need to pass those to GuC
> via the ads since it's GuC itself that does the save and restore.
>
Yes, but not for hwconfig load. We can load the GuC to read hwconfig and
then properly populate engine WAs list when the GuC is loaded for
submission.
> > Likewise for_each_hw_engine should only be called after
> > xe_hw_engines_init_early is called. With that, I think this patch is
> > correct. Does that make sense?
>
> as per above, no. I think we are overcomplicating it for no real
> benefit.
>
I agree this is complicated but I thought this ordering / reading the
platform information from hwconfig was requirement. Maybe we should
follow up to check if this is indeed a requirement.
> >
> > I suggested to Tejas that we add an assert to for_each_hw_engine to
> > enforce this is called after xe_hw_engines_init_early to catch issues
> > like this too.
>
> I think that is already clear. Not sure it deserves an assert. The
> assert would for instance trigger right now in calculate_regset_size(),
> but that is harmless. Also we'd have it spread through the driver.
>
Well an assert like this would caught this bug quite a bit early.
Matt
> Lucas De Marchi
>
> >
> > > ...
> > > xe_wa_process_gt();
> > >
> > > That gt_fw_domain_init() seems a very poor place to initialize these
> > > things now. It started with commit b892e30c6cd7 ("drm/xe: Move load of
> > > minimal GuC and hwconfig read much earlier") and then people end up
> > > adding random stuff in that function :(
> > >
> > > So I think we need to get back to the original idea of _early()
> > > initializing what's needed. Then we'd probably need a few cleanups in
> > > the initialization sequence so the function names at least partially
> > > matches what's being initialized.
> > >
> >
> > I agree on that the naming of all of this could be better...
> >
> > Matt
> >
> > > Lucas De Marchi
> > >
> > > > xe_force_wake_init_engines(gt, gt_to_fw(gt));
> > > >
> > > > err = all_fw_domain_init(gt);
> > > > --
> > > > 2.25.1
> > > >
More information about the Intel-xe
mailing list