[PATCH v2 1/5] drm/xe/rtp: Prepare RTP to define SR-IOV specific rules

Tue Feb 25 03:04:09 UTC 2025

On Tue, Feb 25, 2025 at 12:36:25AM +0100, Michal Wajdeczko wrote:
> 
> 
> On 24.02.2025 20:41, Matt Roper wrote:
> > On Mon, Feb 24, 2025 at 04:06:56PM +0100, Michal Wajdeczko wrote:
> >> Almost all currently defined workaround actions are dedicated
> >> for the driver running in a privileged mode (native or SR-IOV PF)
> >> and those actions should not be used when running in the VF mode.
> >>
> >> While we can easily skip any attempt to apply tile, engine or
> >> LRC workarounds by doing early exit, it's not so easy with OOB
> >> workarounds, as on exception basis, there could be some actions
> >> expected to be done also by the VF driver.
> > 
> > A bit orthogonal to the specific patch you're adding here, but stepping
> > back and thinking about this for a moment...does it actually make sense
> > for LRC workarounds to not apply to VF?  I can understand the tile, gt,
> > and engine workarounds not applying since presumably there's only one
> > setting at the hardware level and the VFs would use the setting the PF
> > had programmed into the hardware.  But LRC workarounds seem different
> > because the hardware value is expected to update with each context
> > switch to whatever is in the incoming context's LRC.  If we don't apply
> > LRC workarounds in a VF, then the actual LRC getting switched in won't
> > be inheriting anything from the VF, but rather using whatever unmodified
> > value is present in the VF's default LRC.
> > 
> > Am I overlooking somoething here or should we actually be handling LRC
> > workarounds in a different manner?
> > 
> 
> from commit 9632dfb0def4 ("drm/xe/vf: Don't run any save-restore RTP
> actions if VF") we are doing early exit from xe_rtp_process_to_sr() that
> is also used for LRC WAs
> 
> and IIUC the Bspec:52398 says that VFs can write only to allowed
> registers within the engines, which our chicken WA regs are not, so all
> LRI will be NOP'ed anyway

Yeah, I don't think LRI's and such executed from batchbuffers work
(i.e., the way we apply LRC workarounds today), but the statement in
that table about allowing accesses to enable context save/restore makes
it sound like the contents of the LRC in memory do still get loaded into
the hardware (i.e., LRI's done as part of context switch take effect,
even if LRI's done as part of batch/ring execution don't).

So that makes me wonder if we actually need to rethink how we do LRC
workarounds, at least for VF's.  Today we do:

 1. Hardware comes out of reset with default register/state values
 2. Context switch to context (A) with "restore inhibit" to make a
    context active on the engine without actually doing a "context
    restore" or changing any register values.
 3. Submit a batchbuffer containing all the register + state adjustments
    we want to make for the purposes of workarounds, tuning, etc.
 4. Context switch to context (B) to force the current register/state
    values to be written out to the LRC in memory of context (A).  We
    grab a copy of this LRC to use as the "default LRC" that acts as the
    starting point for all subsequent contexts created.

It sounds like in a VF, step 3 here is ineffective, at least for
registers.  However in theory it should still be possible to apply the
workarounds by skipping step 3, and instead manually poking the updated
values into the proper place in the LRC in memory as a new step 5.  That
would probably be a lot more fragile (i.e., we have to be careful that
we're finding the correct offset in the LRC for the register or state
instruction, and make sure that we don't accidentally clobber any
non-workaround values that the default context is trying to set).  But
in theory I think it should still be possible (and probably what we
should be doing if we want the effect of the workarounds)?

> 
> >>
> >> Since we process OOB rules also in the SR-IOV VF mode, we had to
> >> explicitly decline the ones that were not applicable for VF, by
> >> using FUNC(xe_rtp_match_not_sriov_vf). However that may not
> >> scale well, since most future workarounds will likely not be
> >> needed in VF mode.
> >>
> >> Instead of forcing us to append more and more FUNC(not_sriov_vf)
> >> to potentially every new OOB workaround, change the RTP logic
> >> and assume that RTP rules are not applicable to VFs by default,
> >> unless there will be added a special SRIOV(VF_READY) rule.
> >>
> >> As of today only no_media_l3 OOB workaround seems to be applicable
> >> also for the VF mode, since the value of MIRROR_FUSE3(0x9118) from
> >> the PF is likely unreliable, so mark that WA as VF_READY right away.
> >>
> >> For completeness define also SRIOV(ONLY) to require PF or VF mode,
> >> SRIOV(PF_ONLY) to require PF mode and SRIOV(VF_ONLY) for VF mode.
> > 
> > For the purposes of a workaround, what would be the difference between
> > PF mode and native?  
> 
> in PF mode (controlled by xe.max_vfs param) we must retain possibility
> to enable VFs, so we might need to make different steps
> 
> > I.e., what kind of language in a workaround
> > description would signal to us that we need PF_ONLY (or SRIOV(ONLY))?
> 
> what about "in PF mode we shouldn't modify XXX or enable YYY, or we must
> use different setup for ZZZ, where XXX/YYY/ZZZ are native features, as
> otherwise we would completely block VFs enabling"
> 
> > Do we have any historic examples of these?
> 
> not yet, but in rev1 I was also assuming that VF_READY is just a ready
> to use solution for some future undefined yet exception case, but was
> shortly pointed by the CI that there is one already (no_media_l3)

Okay, sounds good.  I just want to make sure we'll be able to recognize
such workaround if/when they show up in the workaround database.

Matt

> 
> > 
> > 
> > Matt
> > 
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> >> Cc: Matt Roper <matthew.d.roper at intel.com>
> >> ---
> >> v2: tag no_media_l3 workaround as VF_READY
> >> ---
> >>  drivers/gpu/drm/xe/xe_rtp.c        | 20 +++++++++++++++++++-
> >>  drivers/gpu/drm/xe/xe_rtp.h        |  8 ++++++++
> >>  drivers/gpu/drm/xe/xe_rtp_types.h  |  4 ++++
> >>  drivers/gpu/drm/xe/xe_wa_oob.rules |  2 +-
> >>  4 files changed, 32 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_rtp.c b/drivers/gpu/drm/xe/xe_rtp.c
> >> index 7a1c78fdfc92..b607be981590 100644
> >> --- a/drivers/gpu/drm/xe/xe_rtp.c
> >> +++ b/drivers/gpu/drm/xe/xe_rtp.c
> >> @@ -37,6 +37,7 @@ static bool rule_matches(const struct xe_device *xe,
> >>  {
> >>  	const struct xe_rtp_rule *r;
> >>  	unsigned int i, rcount = 0;
> >> +	bool vf_ready = false;
> >>  	bool match;
> >>  
> >>  	for (r = rules, i = 0; i < n_rules; r = &rules[++i]) {
> >> @@ -110,6 +111,22 @@ static bool rule_matches(const struct xe_device *xe,
> >>  		case XE_RTP_MATCH_FUNC:
> >>  			match = r->match_func(gt, hwe);
> >>  			break;
> >> +		case XE_RTP_MATCH_SRIOV_VF_READY:
> >> +			match = true;
> >> +			vf_ready = true;
> >> +			break;
> >> +		case XE_RTP_MATCH_SRIOV_VF_ONLY:
> >> +			match = IS_SRIOV_VF(xe);
> >> +			vf_ready = match;
> >> +			break;
> >> +		case XE_RTP_MATCH_SRIOV_PF_ONLY:
> >> +			match = IS_SRIOV_PF(xe);
> >> +			vf_ready = false;
> >> +			break;
> >> +		case XE_RTP_MATCH_SRIOV_ONLY:
> >> +			match = IS_SRIOV(xe);
> >> +			vf_ready = match;
> >> +			break;
> >>  		default:
> >>  			drm_warn(&xe->drm, "Invalid RTP match %u\n",
> >>  				 r->match_type);
> >> @@ -128,6 +145,7 @@ static bool rule_matches(const struct xe_device *xe,
> >>  				return false;
> >>  
> >>  			rcount = 0;
> >> +			vf_ready = false;
> >>  		} else {
> >>  			rcount++;
> >>  		}
> >> @@ -137,7 +155,7 @@ static bool rule_matches(const struct xe_device *xe,
> >>  	if (drm_WARN_ON(&xe->drm, !rcount))
> >>  		return false;
> >>  
> >> -	return true;
> >> +	return IS_SRIOV_VF(xe) ? vf_ready : true;
> >>  }
> >>  
> >>  static void rtp_add_sr_entry(const struct xe_rtp_action *action,
> >> diff --git a/drivers/gpu/drm/xe/xe_rtp.h b/drivers/gpu/drm/xe/xe_rtp.h
> >> index 38b9f13bba5e..7874ea8588db 100644
> >> --- a/drivers/gpu/drm/xe/xe_rtp.h
> >> +++ b/drivers/gpu/drm/xe/xe_rtp.h
> >> @@ -207,6 +207,14 @@ struct xe_reg_sr;
> >>  #define XE_RTP_RULE_IS_DISCRETE							\
> >>  	{ .match_type = XE_RTP_MATCH_DISCRETE }
> >>  
> >> +/**
> >> + * XE_RTP_RULE_SRIOV - Create a SR-IOV rule for Virtual Functions
> >> + *
> >> + * Refer to XE_RTP_RULES() for expected usage.
> >> + */
> >> +#define XE_RTP_RULE_SRIOV(TAG)							\
> >> +	{ .match_type = XE_RTP_MATCH_SRIOV_##TAG }
> >> +
> >>  /**
> >>   * XE_RTP_RULE_OR - Create an OR condition for rtp rules
> >>   *
> >> diff --git a/drivers/gpu/drm/xe/xe_rtp_types.h b/drivers/gpu/drm/xe/xe_rtp_types.h
> >> index 1b76b947c706..02c4e67906cc 100644
> >> --- a/drivers/gpu/drm/xe/xe_rtp_types.h
> >> +++ b/drivers/gpu/drm/xe/xe_rtp_types.h
> >> @@ -53,6 +53,10 @@ enum {
> >>  	XE_RTP_MATCH_ENGINE_CLASS,
> >>  	XE_RTP_MATCH_NOT_ENGINE_CLASS,
> >>  	XE_RTP_MATCH_FUNC,
> >> +	XE_RTP_MATCH_SRIOV_VF_READY,
> >> +	XE_RTP_MATCH_SRIOV_VF_ONLY,
> >> +	XE_RTP_MATCH_SRIOV_PF_ONLY,
> >> +	XE_RTP_MATCH_SRIOV_ONLY,
> >>  	XE_RTP_MATCH_OR,
> >>  };
> >>  
> >> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
> >> index 228436532282..ff9da494efcf 100644
> >> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> >> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> >> @@ -40,6 +40,6 @@
> >>  16023588340	GRAPHICS_VERSION(2001)
> >>  14019789679	GRAPHICS_VERSION(1255)
> >>  		GRAPHICS_VERSION_RANGE(1270, 2004)
> >> -no_media_l3	MEDIA_VERSION(3000)
> >> +no_media_l3	MEDIA_VERSION(3000), SRIOV(VF_READY)
> >>  14022866841	GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0)
> >>  		MEDIA_VERSION(3000), MEDIA_STEP(A0, B0)
> >> -- 
> >> 2.47.1
> >>
> > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation