[PATCH 1/1] drm/xe: Only use reserved BCS instances for usm migrate exec queue

Wed May 8 02:42:44 UTC 2024

On Tue, May 07, 2024 at 04:59:25PM -0700, Welty, Brian wrote:
> 
> 
> On 4/15/2024 12:04 PM, Matthew Brost wrote:
> > The GuC context scheduling queue is 2 entires deep, thus it is possible
> > for a migration job to be stuck behind a fault if migration exec queue
> > shares engines with user jobs. This can deadlock as the migrate exec
> > queue is required to service page faults. Avoid deadlock by only using
> > reserved BCS instances for usm migrate exec queue.
> 
> So the underlying concept was always broken here?
> 

It seems to be broken.

> With the mask of more than one engine, the virtual engine still won't always
> pick an idle engine?  HW may end up picking an engine and

I thought the GuC would always pick the idle hardware engine but it
doesn't appear to be the case (I can see a clear deadlock before this
patch, resolved after). Maybe the GuC considers 1 exec queue on the
engine idle so it doesn't pick resevered engine?

We probably should follow up with GuC team on this but for PVC as a SDV,
I think we should get this merged.

Matt

> confusing it to have been idle?   Because the extra 2-depth thing is not
> being considered?
> 
> 
> > 
> > Fixes: a043fbab7af5 ("drm/xe/pvc: Use fast copy engines as migrate engine on PVC")
> > Cc: Matt Roper <matthew.d.roper at intel.com>
> > Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_migrate.c | 14 +++++---------
> >   1 file changed, 5 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index 9f6e9b7f11c8..c37bb7dfcf1f 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -12,8 +12,6 @@
> >   #include <drm/ttm/ttm_tt.h>
> >   #include <drm/xe_drm.h>
> > -#include <generated/xe_wa_oob.h>
> > -
> >   #include "instructions/xe_mi_commands.h"
> >   #include "regs/xe_gpu_commands.h"
> >   #include "regs/xe_gtt_defs.h"
> > @@ -34,7 +32,6 @@
> >   #include "xe_sync.h"
> >   #include "xe_trace.h"
> >   #include "xe_vm.h"
> > -#include "xe_wa.h"
> >   /**
> >    * struct xe_migrate - migrate context.
> > @@ -300,10 +297,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> >   }
> >   /*
> > - * Due to workaround 16017236439, odd instance hardware copy engines are
> > - * faster than even instance ones.
> > - * This function returns the mask involving all fast copy engines and the
> > - * reserved copy engine to be used as logical mask for migrate engine.
> >    * Including the reserved copy engine is required to avoid deadlocks due to
> >    * migrate jobs servicing the faults gets stuck behind the job that faulted.
> >    */
> > @@ -317,8 +310,7 @@ static u32 xe_migrate_usm_logical_mask(struct xe_gt *gt)
> >   		if (hwe->class != XE_ENGINE_CLASS_COPY)
> >   			continue;
> > -		if (!XE_WA(gt, 16017236439) ||
> > -		    xe_gt_is_usm_hwe(gt, hwe) || hwe->instance & 1)
> > +		if (xe_gt_is_usm_hwe(gt, hwe))
> >   			logical_mask |= BIT(hwe->logical_instance);
> >   	}
> > @@ -369,6 +361,10 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile)
> >   		if (!hwe || !logical_mask)
> >   			return ERR_PTR(-EINVAL);
> > +		/*
> > +		 * XXX: Currently only reserving 1 (likely slow) BCS instance on
> > +		 * PVC, may want to revisit if performance is needed.
> > +		 */
> >   		m->q = xe_exec_queue_create(xe, vm, logical_mask, 1, hwe,
> >   					    EXEC_QUEUE_FLAG_KERNEL |
> >   					    EXEC_QUEUE_FLAG_PERMANENT |