[Intel-xe] [PATCH 7/7] drm/xe/guc: Port workarounds to OOB infra

Thu May 4 22:19:50 UTC 2023

On Thu, 2023-05-04 at 14:56 -0700, Lucas De Marchi wrote:
> On Thu, May 04, 2023 at 12:37:05PM -0700, Jose Souza wrote:
> > On Thu, 2023-05-04 at 12:30 -0700, Lucas De Marchi wrote:
> > > On Thu, May 04, 2023 at 03:30:48PM +0000, Jose Souza wrote:
> > > > On Thu, 2023-05-04 at 00:32 -0700, Lucas De Marchi wrote:
> > > > > Let xe_guc.c start using XE_WA() for workarounds. One workaround
> > > > > couldn't be converted due to no support for function hook checks.
> > > > > 
> > > > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_guc.c | 40 +++++++++++++------------------------
> > > > >  1 file changed, 14 insertions(+), 26 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > > > > index f7d32b744247..eea5ed015282 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > > > > @@ -19,7 +19,9 @@
> > > > >  #include "xe_guc_submit.h"
> > > > >  #include "xe_mmio.h"
> > > > >  #include "xe_platform_types.h"
> > > > > +#include "xe_rtp.h"
> > > > >  #include "xe_uc_fw.h"
> > > > > +#include "xe_wa.h"
> > > > >  #include "xe_wopcm.h"
> > > > > 
> > > > >  #define MEDIA_GUC_HOST_INTERRUPT        XE_REG(0x190304)
> > > > > @@ -136,29 +138,22 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
> > > > >  	struct xe_gt *gt = guc_to_gt(guc);
> > > > >  	u32 flags = 0;
> > > > > 
> > > > > -	/* Wa_22012773006:gen11,gen12 < XeHP */
> > > > > -	if (GRAPHICS_VER(xe) >= 11 &&
> > > > > -	    GRAPHICS_VERx100(xe) < 1250)
> > > > > +	if (XE_WA(xe, "22012773006", GRAPHICS_VERSION_RANGE(1100, 1250)))
> > > > >  		flags |= GUC_WA_POLLCS;
> > > > 
> > > > Just a suggestion but would be better to have the workaround rule defined somewhere else and here only have 'if (XE_WA(xe, "22012773006"))'.
> > > > 
> > > > 
> > > > Mesa now has something like that.
> > > > We have a Python script that generates a JSON file with workaround data from the workaround database API.
> > > > During build time, another Python script generates the intel_wa.c file using the information in JSON, setting a bit for each workaround that applies
> > > > to each platform.
> > > > In runtime it get the bit field with the workarounds that applies to running platform on given stepping.
> > > 
> > > that is one of the alternatives I shared in the RFC. There are
> > > some issues with such approach though:
> > > 
> > > 1) The condition is in a totally different place tha have to be checked
> > >     to follow the code flow.
> > > 
> > >     The upside is that if the same condition needs to be checked again in
> > >     multiple places for these OOB workarounds, the condition doesn't
> > >     need to be repeated
> > > 
> > > 2) Not possible to encode conditions that are not platform-centric or
> > > that needs to be decided in runtime (e.g. if an engine is fused off or
> > > not)
> > > 
> > > > In code Mesa we have 'if (intel_needs_workaround(devinfo, 14015055625))', Xe kmd would probably need additional checks like 'if (engine == BLIT &&
> > > > XE_WA(xe, "xxxxxx"))'.
> > > 
> > > yeah... that maybe solves the issue (2) above. We'd need to leave the
> > > non-platform checks to each callers. And maybe follow the same pattern
> > > for the gt/engine/lrc workarounds.
> > > 
> > > > 
> > > > This solve the problem we have were workarounds are left enabled in platforms that already have a HW fix for it.
> > > 
> > > not sure how that solves this issue though. We mark the stepping in
> > > which the workaround are supposedly fixed.
> > 
> > Mesa problems was with permanent workarounds, a example:
> > 
> > Some workaround is required by all gen12.0 and dg2 and was implemented with: 'if (gt_ver >= 12)' but then it was fixed in MTL.
> > It is pretty easy to forget to update it.
> 
> 
> but then the solution is not about where the check is, i.e. in the
> caller vs a centralized place. The solution is to generate the
> version range or platform from the spec.  If we were to move to a
> centralized place without generating that automatically we'd still have
> the same issue.... which goes back to the issue of having code
> generation in the kernel.
> 
>  From a quick look at the mesa impl, what I don't quite understand after
> grepping for intel_needs_workaround() and looking a the .cpp/.h
> generation:  how do you check if the workaround is implemented/active?
> Because 1) you generated a .json from the database, 2) you generated
> the condition to enable the workaround for *all* platforms; 3) Now you
> can't rely anymore on having that definition to decide it's
> implemented/active.  Do you track callers of intel_needs_workaround?
> Something else? This seems a little short based on the size
> of mesa_defs.json. Or is it just because the past platforms weren't
> converted to this new infra?

Some files are compiled one time for each gfx version, so there is also the INTEL_NEEDS_WA_ to grep.

But yeah the conversion is happening gradually, new workarounds are implemented with the new framework.
Mark probably have some ideas about how to track workarounds that we are missing implementation against database.

> 
> git grep -h intel_needs_workaround | sed -n 's/.*intel_needs_workaround(.*,\s*\([0-9]*\)).*/\1/p' | sort -u
> 14010017096
> 14012437816
> 14012688258
> 14014063774
> 14015297576
> 14015360517
> 14015590813
> 14017341140
> 16013063087
> 22012575642
> 22012785325
> 22013689345
> 
> Also Cc'ing here some other  people involved with the mesa impl to check
> if we can improve kernel side.
> 
> thanks
> Lucas De Marchi
> 
> > 
> > > 
> > > > 
> > > > Not sure if such think would be acceptable in Linux, just sharing what Mesa have done.
> > > 
> > > Generating code to be built is done in a few places in the kernel
> > > I already suggested doing that for the register definitions, but
> > > there are always push back.  I will take a look at this again.
> > > 
> > > Thanks
> > > Lucas De Marchi
> > > 
> > > > 
> > > > > 
> > > > > -	/* Wa_16011759253 */
> > > > > -	/* Wa_22011383443 */
> > > > > -	if (IS_SUBPLATFORM_STEP(xe, XE_DG2, XE_SUBPLATFORM_DG2_G10, STEP_A0, STEP_B0))
> > > > > +	if (XE_WA(xe, "16011759253, 22011383443",
> > > > > +		  SUBPLATFORM(DG2, G10), STEP(A0, B0)))
> > > > >  		flags |= GUC_WA_GAM_CREDITS;
> > > > > 
> > > > > -	/* Wa_14014475959 */
> > > > > -	if (IS_PLATFORM_STEP(xe, XE_METEORLAKE, STEP_A0, STEP_B0) ||
> > > > > -	    xe->info.platform == XE_DG2)
> > > > > +	if (XE_WA(xe, "14014475959[mtl]", PLATFORM(METEORLAKE), STEP(A0, B0)) ||
> > > > > +	    XE_WA(xe, "14014475959[dg2]", PLATFORM(DG2)))
> > > > >  		flags |= GUC_WA_HOLD_CCS_SWITCHOUT;
> > > > > 
> > > > >  	/*
> > > > > -	 * Wa_14012197797
> > > > > -	 * Wa_22011391025
> > > > > -	 *
> > > > >  	 * The same WA bit is used for both and 22011391025 is applicable to
> > > > >  	 * all DG2.
> > > > >  	 */
> > > > > -	if (xe->info.platform == XE_DG2)
> > > > > +	if (XE_WA(xe, "14012197797, 22011391025", PLATFORM(DG2)))
> > > > >  		flags |= GUC_WA_DUAL_QUEUE;
> > > > > 
> > > > >  	/*
> > > > > @@ -169,27 +164,20 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
> > > > >  	if (GRAPHICS_VERx100(xe) < 1270)
> > > > >  		flags |= GUC_WA_PRE_PARSER;
> > > > > 
> > > > > -	/* Wa_16011777198 */
> > > > > -	if (IS_SUBPLATFORM_STEP(xe, XE_DG2, XE_SUBPLATFORM_DG2_G10, STEP_A0, STEP_C0) ||
> > > > > -	    IS_SUBPLATFORM_STEP(xe, XE_DG2, XE_SUBPLATFORM_DG2_G11, STEP_A0,
> > > > > -				STEP_B0))
> > > > > +	if (XE_WA(xe, "16011777198[dg2_g10]", SUBPLATFORM(DG2, G10), STEP(A0, C0)) ||
> > > > > +	    XE_WA(xe, "16011777198[dg2_g11]", SUBPLATFORM(DG2, G11), STEP(A0, B0)))
> > > > >  		flags |= GUC_WA_RCS_RESET_BEFORE_RC6;
> > > > > 
> > > > > -	/*
> > > > > -	 * Wa_22012727170
> > > > > -	 * Wa_22012727685
> > > > > -	 */
> > > > > -	if (IS_SUBPLATFORM_STEP(xe, XE_DG2, XE_SUBPLATFORM_DG2_G10, STEP_A0, STEP_C0) ||
> > > > > -	    IS_SUBPLATFORM_STEP(xe, XE_DG2, XE_SUBPLATFORM_DG2_G11, STEP_A0,
> > > > > -				STEP_FOREVER))
> > > > > +	if (XE_WA(xe, "22012727170, 22012727685 [dg2_g10]", SUBPLATFORM(DG2, G10), STEP(A0, C0)) ||
> > > > > +	    XE_WA(xe, "22012727170, 22012727685 [dg2_g11]", SUBPLATFORM(DG2, G11), STEP(A0, FOREVER)))
> > > > >  		flags |= GUC_WA_CONTEXT_ISOLATION;
> > > > > 
> > > > >  	/* Wa_16015675438, Wa_18020744125 */
> > > > > +	/* FIXME: currently there is no support for gt checks in XE_WA() */
> > > > >  	if (!xe_hw_engine_mask_per_class(gt, XE_ENGINE_CLASS_RENDER))
> > > > >  		flags |= GUC_WA_RCS_REGS_IN_CCS_REGS_LIST;
> > > > > 
> > > > > -	/* Wa_1509372804 */
> > > > > -	if (IS_PLATFORM_STEP(xe, XE_PVC, STEP_B0, STEP_C0))
> > > > > +	if (XE_WA(xe, "1509372804", PLATFORM(PVC), STEP(B0, C0)))
> > > > >  		flags |= GUC_WA_RENDER_RST_RC6_EXIT;
> > > > > 
> > > > >  	return flags;
> > > > 
> >