[Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START

Thu Feb 2 14:02:28 UTC 2023

On Wed, 2023-02-01 at 16:20 -0500, Rodrigo Vivi wrote:
> On Mon, Jan 30, 2023 at 07:43:58PM +0000, Souza, Jose wrote:
> > On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> > > On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > > > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > > > behavior as older platforms.
> > > > > > 
> > > > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > > > chained batch buffer level.
> > > > > 
> > > > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > > > believe hardware will be removing the option to turn off nested
> > > > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > > > KMD seems like the easiest time to make that happen since the UMDs are
> > > > > already updating their programming models.
> > > > 
> > > > This would bring even more changes to track when debugging issues in Xe KMD port.
> > > > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> > > 
> > > but if we start supporting these platforms with nested bb disabled,
> > > we can't change it later. At least not for the older platforms that had
> > > it that way.
> > 
> > Xe KMD will not support by default TGL, DG2... anyways.
> > Why have different hardware behavior between KMDs then?
> 
> For the older platforms we will always be behind the force_probe.
> So we will be able to change the uapi behavior later.
> 
> However, I tend to agree with Matt and Lucas that it would be better
> if we align with the future, instead of trying to align with i915
> first.
> 
> But I'd like to hear from Jose, how much of a trouble this is for the
> user space? What are the advantages of aligning with i915's current
> behavior first and then move later?
> 
> If you noticed all other uapi, there's not a much of alignment with
> i915 anyway. Why should we align on this?

i915 x Xe changes don't affect the files and functions that fill the batch buffers in Mesa.
Changing the behavior of MI_BATCH_BUFFER_START would be the first change that actually goes to batch buffers.
Minimal batch buffer changes means less surface to debug in issues that only happens with Xe KMD.

That is why I would like to postpone this behavior change to when Xe KMD and UMDs support to it are more stable.

> 
> Thanks,
> Rodrigo.
> 
> > 
> > > 
> > > Lucas De Marchi
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > 
> > > > > > v2:
> > > > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > > > - move fake workaround to lrc_additional_programming table
> > > > > > 
> > > > > > Bspec: 45974, 45718
> > > > > > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> > > > > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > > > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > > > >  3 files changed, 30 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > > > 
> > > > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > > > >  		xe_wa_process_lrc(hwe);
> > > > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > > > 
> > > > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > > > >  					   xe_lrc_size(xe, hwe->class),
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > index 3325de3edf691..744b7d0982683 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > > > >  	{}
> > > > > >  };
> > > > > > 
> > > > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > > > +	  /*
> > > > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > > > +	   */
> > > > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > > > +		     TGL_NESTED_BB_EN,
> > > > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > > > +	},
> > > > > > +	{}
> > > > > > +};
> > > > > > +
> > > > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > >  }
> > > > > > 
> > > > > > +/**
> > > > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > > > + * table
> > > > > > + * @hwe: engine instance to process workarounds for
> > > > > > + *
> > > > > > + * Process additional context programming table for this platform, saving in
> > > > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > > > + */
> > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > > > +{
> > > > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > > > >   * @hwe: engine instance to process whitelist for
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > > > 
> > > > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > > > --
> > > > > > 2.39.1
> > > > > > 
> > > > > 
> > > > 
> >