[Intel-xe] [PATCH] drm/xe: Move wa processing to gt_init

Lucas De Marchi lucas.demarchi at intel.com
Wed Nov 29 01:07:39 UTC 2023


On Tue, Nov 28, 2023 at 01:55:40PM +0000, Matthew Brost wrote:
>On Tue, Nov 28, 2023 at 11:45:36AM -0600, Lucas De Marchi wrote:
>> On Tue, Nov 28, 2023 at 08:47:59AM +0000, Matthew Brost wrote:
>> > On Tue, Nov 21, 2023 at 08:48:47AM -0600, Lucas De Marchi wrote:
>> > > On Tue, Nov 21, 2023 at 04:55:55PM +0530, Tejas Upadhyay wrote:
>> > > > Currently WA processing for gt is happening in early init API.
>> > > > It also includes processing of WAs which needs filter based on
>> > > > hardware engine by looping over number of hardware engines in gt.
>> > > >
>> > > > However at early init stage hardware engines are not associated
>> > > > with gt in software which skips processing such WAs.
>> > > >
>> > > > Example WAs which will be fixed to apply after this change,
>> > > > "14011060649",
>> > > > "16010515920",
>> > > > "16021867713"
>> > > >
>> > > > Fixes: 811a9b3d262d8 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> > > > ---
>> > > > drivers/gpu/drm/xe/xe_gt.c | 2 +-
>> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> > > > index 0dddb751c6a4..3d0f7b6f952c 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
>> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
>> > > > @@ -311,7 +311,6 @@ int xe_gt_init_early(struct xe_gt *gt)
>> > > > 	if (err)
>> > > > 		return err;
>> > > >
>> > > > -	xe_wa_process_gt(gt);
>> > > > 	xe_wa_process_oob(gt);
>> > > > 	xe_tuning_process_gt(gt);
>> > > >
>> > > > @@ -495,6 +494,7 @@ int xe_gt_init(struct xe_gt *gt)
>> > > > 	if (err)
>> > > > 		return err;
>> > > >
>> > > > +	xe_wa_process_gt(gt);
>> > >
>> > > the problem with that is that if we need to touch any register already
>> > > touched until this point, the workarounds would not had been applied.
>> > > Digging the git history, I had added this commit exactly to be able to
>> > > initialize the gt workarounds as soon as possible:
>> > >
>> > > 	4eb2e8d25ab4 ("drm/xe: Add xe_hw_engines_init_early()")
>> > >
>> > > And then
>> > >
>> > > 	816d37907ac6 ("drm/xe: Plug gt WAs to gt init/reset")
>> > >
>> > > Note that these commits are in the xe branch, not drm-xe-next.
>> > > So, it was:
>> > >
>> > > 	xe_hw_engines_init_early(gt);
>> >
>> > xe_hw_engines_init_early has be called after xe_uc_init_hwconfig after
>>
>> did you mean "has to be" or "has been"?
>>
>
>Sorry, "has to be".
>
>> > the idea is gt->info.engine_mask would be set by reading the hwconfig
>> > table from the GuC rather than from a hardcoded table in the driver.
>>
>> like I said, the problem with postponing that is we could have hard to
>> debug issues because we will think we have all the workarounds coded,
>> not realizing it isn't applied to the early stages of hw initialization.
>>
>> I think we should go back push back on the idea of initializing the hw
>> engines array after loading guc... it doesn't make much sense from the
>> engine initialization pov and we are *not* doing that.
>>
>> static int gt_fw_domain_init(struct xe_gt *gt)
>> {
>> 	...
>> 	err = xe_uc_init_hwconfig(&gt->uc);                                   	if
>> (err)                                                              		goto
>> err_force_wake;
>>
>> 	xe_gt_idle_sysfs_init(&gt->gtidle);
>>
>> 	/* XXX: Fake that we pull the engine mask from hwconfig blob */
>> 	gt->info.engine_mask = gt->info.__engine_mask;
>
>Right, we don't currently parse the hwconfig to get the engine_mask but
>eventually they should be.
>
>>
>> 	...
>> }
>>
>> The engines available are not hardcoded in the driver. We read the fuses
>
>The initial value of info.engine_mask is hardcoded in the driver, the
>fuses adjust this. Eventual I think we want to get rid of all platform

but it doesn't matter much. Even if we were to read from the hwconfig,
we would still have gt->info.engine_mask with the upper limit and then
use the hwconfig to trim it down. How different is it from doing the
same thing, but with the fuses? We don't do that today and keep a trim
down engine list, but equally we don't use the hwconfig on
this part of the initialization. The hwconfig could be useful for
something else, but I don't see how it helps here.

>information hardcoded in the driver and dervive info.engine_mask from
>the hwconfig. At least that is the goal from my understanding.
>
>> to know the avilable ones. That is sufficient already to loop through all
>> of them. Note that since it's *engine* WAs, we need to pass those to GuC
>> via the ads since it's GuC itself that does the save and restore.
>>
>
>Yes, but not for hwconfig load. We can load the GuC to read hwconfig and
>then properly populate engine WAs list when the GuC is loaded for
>submission.
>
>> > Likewise for_each_hw_engine should only be called after
>> > xe_hw_engines_init_early is called. With that, I think this patch is
>> > correct. Does that make sense?
>>
>> as per above, no. I think we are overcomplicating it for no real
>> benefit.
>>
>
>I agree this is complicated but I thought this ordering / reading the
>platform information from hwconfig was requirement. Maybe we should
>follow up to check if this is indeed a requirement.

What happens when we need a gt workaround to be applied even before the
first GuC load? This patch moves all the GT workarounds to happen after
the first load so it potentially creates issues if any of the current WAs
in gt_was[] are needed before that point.

The alternative, adding yet another WA type for when we need to decide
on the WA by looking at the engines it has, doesn't look pretty and also
complicates the decision on where to add new workarounds. When Tejas
asked about the possible solutions and I mentioned making the early
stuff happen early, I was not thinking about moving the WA processing
after the hwconfig, I was thinking about initializing the hw engines
array earlier like it was before.

I think the possible things to do are:

1) move the engine mask to be initialized earlier like it was and stop
pretending we are getting it from hwconfig

2) what this patch does, moving the gt WAs processing after the first
guc load

3) split the gt_was[] - all the WAs that need to check the engine in the
GT would need to be applied later

4) ... something else?

I'd say my preference is (1), (3), (2).

>
>> >
>> > I suggested to Tejas that we add an assert to for_each_hw_engine to
>> > enforce this is called after xe_hw_engines_init_early to catch issues
>> > like this too.
>>
>> I think that is already clear. Not sure it deserves an assert. The
>> assert would for instance trigger right now in calculate_regset_size(),
>> but that is harmless. Also we'd have it spread through the driver.
>>
>
>Well an assert like this would caught this bug quite a bit early.

ok, fair enough on this one.... as long as it's behind the debug
kconfig, it should be fine.

>
>Matt
>
>> Lucas De Marchi
>>
>> >
>> > > 	...
>> > > 	xe_wa_process_gt();
>> > >
>> > > That gt_fw_domain_init() seems a very poor place to initialize these
>> > > things now. It started with commit b892e30c6cd7 ("drm/xe: Move load of
>> > > minimal GuC and hwconfig read much earlier") and then people end up
>> > > adding random stuff in that function :(
>> > >
>> > > So I think we need to get back to the original idea of _early()
>> > > initializing what's needed. Then we'd probably need a few cleanups in
>> > > the initialization sequence so the function names at least partially
>> > > matches what's being initialized.
>> > >
>> >
>> > I agree on that the naming of all of this could be better...
>> >
>> > Matt
>> >
>> > > Lucas De Marchi
>> > >
>> > > > 	xe_force_wake_init_engines(gt, gt_to_fw(gt));
>> > > >
>> > > > 	err = all_fw_domain_init(gt);
>> > > > --
>> > > > 2.25.1
>> > > >


More information about the Intel-xe mailing list