[Intel-xe] [PATCH] drm/xe: Move wa processing to gt_init

Lucas De Marchi lucas.demarchi at intel.com
Tue Nov 28 17:45:36 UTC 2023


On Tue, Nov 28, 2023 at 08:47:59AM +0000, Matthew Brost wrote:
>On Tue, Nov 21, 2023 at 08:48:47AM -0600, Lucas De Marchi wrote:
>> On Tue, Nov 21, 2023 at 04:55:55PM +0530, Tejas Upadhyay wrote:
>> > Currently WA processing for gt is happening in early init API.
>> > It also includes processing of WAs which needs filter based on
>> > hardware engine by looping over number of hardware engines in gt.
>> >
>> > However at early init stage hardware engines are not associated
>> > with gt in software which skips processing such WAs.
>> >
>> > Example WAs which will be fixed to apply after this change,
>> > "14011060649",
>> > "16010515920",
>> > "16021867713"
>> >
>> > Fixes: 811a9b3d262d8 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_gt.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> > index 0dddb751c6a4..3d0f7b6f952c 100644
>> > --- a/drivers/gpu/drm/xe/xe_gt.c
>> > +++ b/drivers/gpu/drm/xe/xe_gt.c
>> > @@ -311,7 +311,6 @@ int xe_gt_init_early(struct xe_gt *gt)
>> > 	if (err)
>> > 		return err;
>> >
>> > -	xe_wa_process_gt(gt);
>> > 	xe_wa_process_oob(gt);
>> > 	xe_tuning_process_gt(gt);
>> >
>> > @@ -495,6 +494,7 @@ int xe_gt_init(struct xe_gt *gt)
>> > 	if (err)
>> > 		return err;
>> >
>> > +	xe_wa_process_gt(gt);
>>
>> the problem with that is that if we need to touch any register already
>> touched until this point, the workarounds would not had been applied.
>> Digging the git history, I had added this commit exactly to be able to
>> initialize the gt workarounds as soon as possible:
>>
>> 	4eb2e8d25ab4 ("drm/xe: Add xe_hw_engines_init_early()")
>>
>> And then
>>
>> 	816d37907ac6 ("drm/xe: Plug gt WAs to gt init/reset")
>>
>> Note that these commits are in the xe branch, not drm-xe-next.
>> So, it was:
>>
>> 	xe_hw_engines_init_early(gt);
>
>xe_hw_engines_init_early has be called after xe_uc_init_hwconfig after

did you mean "has to be" or "has been"?

>the idea is gt->info.engine_mask would be set by reading the hwconfig
>table from the GuC rather than from a hardcoded table in the driver.

like I said, the problem with postponing that is we could have hard to
debug issues because we will think we have all the workarounds coded,
not realizing it isn't applied to the early stages of hw initialization.

I think we should go back push back on the idea of initializing the hw
engines array after loading guc... it doesn't make much sense from the
engine initialization pov and we are *not* doing that.

static int gt_fw_domain_init(struct xe_gt *gt)
{
	...
	err = xe_uc_init_hwconfig(&gt->uc);                                   
	if (err)                                                              
		goto err_force_wake;                                          

	xe_gt_idle_sysfs_init(&gt->gtidle);                                   

	/* XXX: Fake that we pull the engine mask from hwconfig blob */       
	gt->info.engine_mask = gt->info.__engine_mask;                        

	...
}

The engines available are not hardcoded in the driver. We read the fuses
to know the avilable ones. That is sufficient already to loop through all
of them. Note that since it's *engine* WAs, we need to pass those to GuC
via the ads since it's GuC itself that does the save and restore.

>Likewise for_each_hw_engine should only be called after
>xe_hw_engines_init_early is called. With that, I think this patch is
>correct. Does that make sense?

as per above, no. I think we are overcomplicating it for no real
benefit.

>
>I suggested to Tejas that we add an assert to for_each_hw_engine to
>enforce this is called after xe_hw_engines_init_early to catch issues
>like this too.

I think that is already clear. Not sure it deserves an assert. The
assert would for instance trigger right now in calculate_regset_size(),
but that is harmless. Also we'd have it spread through the driver.

Lucas De Marchi

>
>> 	...
>> 	xe_wa_process_gt();
>>
>> That gt_fw_domain_init() seems a very poor place to initialize these
>> things now. It started with commit b892e30c6cd7 ("drm/xe: Move load of
>> minimal GuC and hwconfig read much earlier") and then people end up
>> adding random stuff in that function :(
>>
>> So I think we need to get back to the original idea of _early()
>> initializing what's needed. Then we'd probably need a few cleanups in
>> the initialization sequence so the function names at least partially
>> matches what's being initialized.
>>
>
>I agree on that the naming of all of this could be better...
>
>Matt
>
>> Lucas De Marchi
>>
>> > 	xe_force_wake_init_engines(gt, gt_to_fw(gt));
>> >
>> > 	err = all_fw_domain_init(gt);
>> > --
>> > 2.25.1
>> >


More information about the Intel-xe mailing list