[Intel-xe] [PATCH v2 3/3] drm/xe/pvc: Use fast copy engines as migrate engine on PVC

Wed Aug 23 11:51:06 UTC 2023

On Tue, Aug 22, 2023 at 08:08:40AM -0700, Lucas De Marchi wrote:
>On Fri, Aug 18, 2023 at 01:43:30PM -0700, Matt Roper wrote:
>>On Fri, Aug 18, 2023 at 05:59:02AM -0700, Niranjana Vishwanathapura wrote:
>>>On Thu, Aug 17, 2023 at 09:29:06AM -0700, Matt Roper wrote:
>>>> On Thu, Aug 17, 2023 at 04:03:52AM -0700, Niranjana Vishwanathapura wrote:
>>>> > Some copy hardware engine instances are faster than others on PVC.
>>>> > Use a virtual engine of these plus the reserved instance for the migrate
>>>> > engine on PVC. The idea being if a fast instance is available it will be
>>>> > used and the throughput of kernel copies, clears, and pagefault
>>>> > servicing will be higher.
>>>> >
>>>> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>>> > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
>>>> > ---
>>>> >  drivers/gpu/drm/xe/xe_hw_engine.c | 31 +++++++++++++++++++++++++++++++
>>>> >  drivers/gpu/drm/xe/xe_hw_engine.h |  1 +
>>>> >  drivers/gpu/drm/xe/xe_migrate.c   |  7 ++++---
>>>> >  3 files changed, 36 insertions(+), 3 deletions(-)
>>>> >
>>>> > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
>>>> > index 4c812d04e182..00329efbd968 100644
>>>> > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
>>>> > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
>>>> > @@ -791,3 +791,34 @@ bool xe_hw_engine_is_reserved(struct xe_hw_engine *hwe)
>>>> >  	return xe->info.supports_usm && hwe->class == XE_ENGINE_CLASS_COPY &&
>>>> >  		hwe->instance == gt->usm.reserved_bcs_instance;
>>>> >  }
>>>> > +
>>>> > +/**
>>>> > + * xe_hw_engine_migrate_logical_mask - Return logical mask for migrate engine.
>>>> > + * @gt: gt instance
>>>> > + *
>>>> > + * On PVC, due to workaround 16017236439, odd instance hardware copy engines
>
>s/On PVC, d/D/
>
>>>> > + * are faster than even instance ones.
>>>> > + * This function returns the mask involving all faster copy engines and the
>>>> > + * reserved copy engine to be used as logical mask for migrate engine.
>>>> > + * Including the reserved copy engine is required to avoid deadlocks due to
>>>> > + * migrate jobs servicing the faults gets stuck behind the job that faulted.
>>>> > + */
>>>> > +u32 xe_hw_engine_migrate_logical_mask(struct xe_gt *gt)
>>>> > +{
>>>> > +	struct xe_device *xe = gt_to_xe(gt);
>>>> > +	u32 logical_mask = 0;
>>>> > +	struct xe_hw_engine *hwe;
>>>> > +	enum xe_hw_engine_id id;
>>>> > +
>>>> > +	/* Only support this function on PVC for now */
>>>> > +	if (XE_WARN_ON(xe->info.platform != XE_PVC))
>>>>
>>>> Here you warn if the platform isn't PVC...
>>>>
>>>> > +		return 0;
>>>> > +
>>>> > +	for_each_hw_engine(hwe, gt, id) {
>>>> > +		if (hwe->class == XE_ENGINE_CLASS_COPY &&
>>>> > +		    (hwe->instance & 1 || xe_gt_is_usm_hwe(gt, hwe)))
>>>> > +			logical_mask |= BIT(hwe->logical_instance);
>>>> > +	}
>>>> > +
>>>> > +	return logical_mask;
>>>> > +}
>>>> > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h
>>>> > index 71968ee2f600..edee71e26c57 100644
>>>> > --- a/drivers/gpu/drm/xe/xe_hw_engine.h
>>>> > +++ b/drivers/gpu/drm/xe/xe_hw_engine.h
>>>> > @@ -52,6 +52,7 @@ void xe_hw_engine_handle_irq(struct xe_hw_engine *hwe, u16 intr_vec);
>>>> >  void xe_hw_engine_enable_ring(struct xe_hw_engine *hwe);
>>>> >  u32 xe_hw_engine_mask_per_class(struct xe_gt *gt,
>>>> >  				enum xe_engine_class engine_class);
>>>> > +u32 xe_hw_engine_migrate_logical_mask(struct xe_gt *gt);
>>>> >
>>>> >  struct xe_hw_engine_snapshot *
>>>> >  xe_hw_engine_snapshot_capture(struct xe_hw_engine *hwe);
>>>> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>>>> > index 956a96b38346..cb9a82d13089 100644
>>>> > --- a/drivers/gpu/drm/xe/xe_migrate.c
>>>> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
>>>> > @@ -338,11 +338,12 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
>>>> >  							   XE_ENGINE_CLASS_COPY,
>>>> >  							   primary_gt->usm.reserved_bcs_instance,
>>>> >  							   false);
>>>> > -		if (!hwe)
>>>> > +		u32 logical_mask = xe_hw_engine_migrate_logical_mask(primary_gt);
>>>>
>>>> ...but here you call the function unconditionally on all platforms...
>>>>
>>>> > +
>>>> > +		if (!hwe || !logical_mask)
>>>> >  			return ERR_PTR(-EINVAL);
>>>> >
>>>>
>>>> ...and here we fail out if logical mask isn't set (which is will never
>>>> be on non-PVC).
>>>>
>>>
>>>This patch was derived from the below earlier work.
>>>https://patchwork.freedesktop.org/patch/528460/?series=115575&rev=2
>>>
>>>It was discussed there.
>>>
>>>We should be fine here as currently only PVC Platforms call this
>>>xe_hw_engine_migrate_logical_mask() function. The reason for having a
>>>warning there is to detect any issue while supporting any future platforms
>>>so that we fix that immediately.
>>
>>xe_hw_engine_migrate_logical_mask is called on any platform with
>>supports_usm, right?  I guess that's just PVC at the moment, but we
>>already have LNL patches pending on the mailing list which I expect to
>>eventually start setting that bit as well.  I guess we'll have to make
>>sure we remember to come back here and make some adjustments once a bit
>>of that code starts landing.
>>
>>>
>>>> I think you probably want an OOB workaround for some of this logic (see
>>>> xe_wa_oob.rules); then you can do something like
>>>>
>>>>        if (XE_WA(gt, 16017236439))
>>>>
>>>> to guard any code that should only be executed on platforms where this
>>>> workaround applies.
>>>>
>>>
>>>Hmm...the previous patch in the series adds this WA (16017236439) as a
>>>engine_was (to be moved to lrc_was). I would like to keep that as that
>>>does the register update nicely (instead of adding additional code for
>>>that).
>>>
>>>If at all we need, we need to check here whether that engine/lrc WA is
>>>active. But as you mentioned, XE_WA() only works for OOB WAs.
>>>We could add a duplicate OOB WA here (with same number 16017236439).
>>>But not sure if that is a good idea.
>>>Or, we can add a function to check if a engine/lrc WA is active by
>>>doing the name comparision (instead of XE_WA()). Third option is to
>>>support generating WA index numbering for engine/lrc WAs also (similar
>>>to OOB), so that we can use some macro similar to XE_WA().
>>
>>+cc Lucas for his thoughts.
>
>For now, just adding the workaround to oob with the same number should
>be good enough and that should take care of the other question about
>warning for !pvc too.
>
>>
>>Maybe longer term what we want is another RTP matching rule that behaves
>>as "match platform/stepping according to a specific OOB WA."  That way
>>workarounds that have both simple register programming (in lrcs_was[]
>>and such) and additional out-of-band logic can be handled in a
>>consistent way and the platform/IP matching managed in just a single
>>place.  We could accomplish the same kind of thing with an RTP "FUNC()"
>>rule and a wrapper function today, but it would be nice to avoid the
>>need to manually create a wrapper when adding new workarounds.
>
>
>yeah, that sounds good and not terribly complicated. I will add that to
>my todo list.
>
>>
>>>
>>>What do you suggest?
>>>I think we can leave the code as is for now as it gets called only
>>>for PVC.
>>
>>Since Lucas is handling the LNL enablement, I'll let him decide whether
>>he's okay leaving this as-is for now (and we'll come back and make some
>>adjustments once more of the LNL and Xe2 patches have landed) or whether
>>he'd rather see some adjustments right now.
>
>Let's use the suggested check:
>
>        if (XE_WA(gt, 16017236439))
>

Thnaks Lucas,
I noticed that OOB WA is of XE_RTP_PROCESS_TYPE_GT type and not
XE_RTP_PROCESS_TYPE_ENGINE (which the added lrc_wa is).
So, for OOB WA, we can't specify ENGINE rules. Hence we can't make
the same rule apply for both WAs here.

But for now, I don't need engine specific rules for OOB WA.
I have just set the rule to be PVC platform and posted revised
patch series.

Thanks,
Niranjana

>Lucas De Marchi
>
>>
>>
>>Matt
>>
>>>
>>>Niranjana
>>>
>>>>
>>>> Matt
>>>>
>>>> > -		m->q = xe_exec_queue_create(xe, vm,
>>>> > -					    BIT(hwe->logical_instance), 1,
>>>> > +		m->q = xe_exec_queue_create(xe, vm, logical_mask, 1,
>>>> >  					    hwe, EXEC_QUEUE_FLAG_KERNEL);
>>>> >  	} else {
>>>> >  		m->q = xe_exec_queue_create_class(xe, primary_gt, vm,
>>>> > --
>>>> > 2.21.0.rc0.32.g243a4c7e27
>>>> >
>>>>
>>>> --
>>>> Matt Roper
>>>> Graphics Software Engineer
>>>> Linux GPU Platform Enablement
>>>> Intel Corporation
>>
>>-- 
>>Matt Roper
>>Graphics Software Engineer
>>Linux GPU Platform Enablement
>>Intel Corporation