[PATCH] drm/xe: Dont skip TLB invalidations on VF

Wed Jul 9 10:21:27 UTC 2025

> -----Original Message-----
> From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>
> Sent: 08 July 2025 18:09
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
> xe at lists.freedesktop.org
> Cc: Brost, Matthew <matthew.brost at intel.com>
> Subject: Re: [PATCH] drm/xe: Dont skip TLB invalidations on VF
> 
> 
> 
> On 08.07.2025 11:01, Tejas Upadhyay wrote:
> > Skipping TLB invalidations on VF causing unrecoverable faults.
> 
> oops, my decision to drop it on VF was biased by this old comment:
> 
> 	/* XXX: Do we need this? Leaving for now. */
> 
> > Probable reason for skipping TLB invalidations on SRIOV could be lack
> > of support for instruction MI_FLUSH_DW_STORE_INDEX.
> 
> true, this variant using GGTT is not supported on VFs
> 
> > Add back TLB flush with some
> > additional handling.
> >
> > Helps in resolving,
> > [  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
> >                 ASID: 0
> >                 VFID: 0
> >                 PDATA: 0x0d92
> >                 Faulted Address: 0x0000000002fa0000
> >                 FaultType: 0
> >                 AccessType: 1
> >                 FaultLevel: 0
> >                 EngineClass: 3 bcs
> >                 EngineInstance: 8
> > [  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault
> > response: Unsuccessful -22
> >
> > Suggested-by: Matthew Brost <matthew.brost at intel.com>
> > Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if
> > VF")
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ring_ops.c | 22 ++++++++++------------
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index bc1689db4cd7..ee0fa208e2f8 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -110,13 +110,14 @@ static int emit_bb_start(u64 batch_addr, u32
> ppgtt_flag, u32 *dw, int i)
> >  	return i;
> >  }
> >
> > -static int emit_flush_invalidate(u32 *dw, int i)
> > +static int emit_flush_invalidate(u32 addr, u32 val, u32 *dw, int i)
> 
> this helper is only used once and it looks almost exactly as another open coded
> sequence at the caller - emit_migration_job_gen12(), so maybe move this
> code there as-as?
> 
> >  {
> >  	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> MI_FLUSH_DW_OP_STOREDW |
> > -		  MI_FLUSH_IMM_DW | MI_FLUSH_DW_STORE_INDEX;
> > -	dw[i++] = LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR;
> > -	dw[i++] = 0;
> > +		  MI_FLUSH_IMM_DW;
> > +
> > +	dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
> >  	dw[i++] = 0;
> > +	dw[i++] = val;
> >
> >  	return i;
> >  }
> > @@ -398,22 +399,19 @@ static void emit_migration_job_gen12(struct
> xe_sched_job *job,
> >  				     struct xe_lrc *lrc, u32 seqno)  {
> >  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> > +	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);
> 
> please keep definitions in rev-xmas-tree order

Ok, thanks, I will change this and rest comments are answered by MattB. If no outstanding comment then I will respin with this nit change.

Tejas
> 
> >
> >  	i = emit_copy_timestamp(lrc, dw, i);
> >
> > -	i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> > -				seqno, dw, i);
> > +	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
> >
> >  	dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again
> below */
> >
> >  	i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i);
> >
> > -	if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) {
> > -		/* XXX: Do we need this? Leaving for now. */
> > -		dw[i++] = preparser_disable(true);
> > -		i = emit_flush_invalidate(dw, i);
> > -		dw[i++] = preparser_disable(false);
> > -	}
> > +	dw[i++] = preparser_disable(true);
> > +	i = emit_flush_invalidate(saddr, seqno, dw, i);
> 
> hmm, but seqno is already stored by the above emit_store_imm_ggtt(), so
> maybe to fulfill MI_INVALIDATE_TLB requirement instead of post-sync
> IMM(1) use post-sync TIMESTAMP(3)?
> 
> > +	dw[i++] = preparser_disable(false);
> >
> >  	i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i);
> >