[PATCH] drm/xe: Dont skip TLB invalidations on VF

Tue Jul 8 18:32:36 UTC 2025

On Tue, Jul 08, 2025 at 02:38:50PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 08.07.2025 11:01, Tejas Upadhyay wrote:
> > Skipping TLB invalidations on VF causing unrecoverable
> > faults. 
> 
> oops, my decision to drop it on VF was biased by this old comment:
> 
> 	/* XXX: Do we need this? Leaving for now. */

Also my opps, I was always confused how this worked on VFs and never
follow up on it.

> 
> > Probable reason for skipping TLB invalidations
> > on SRIOV could be lack of support for instruction
> > MI_FLUSH_DW_STORE_INDEX. 
> 
> true, this variant using GGTT is not supported on VFs
> 
> > Add back TLB flush with some
> > additional handling.
> > 
> > Helps in resolving,
> > [  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
> >                 ASID: 0
> >                 VFID: 0
> >                 PDATA: 0x0d92
> >                 Faulted Address: 0x0000000002fa0000
> >                 FaultType: 0
> >                 AccessType: 1
> >                 FaultLevel: 0
> >                 EngineClass: 3 bcs
> >                 EngineInstance: 8
> > [  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
> > 
> > Suggested-by: Matthew Brost <matthew.brost at intel.com>
> > Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if VF")
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ring_ops.c | 22 ++++++++++------------
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index bc1689db4cd7..ee0fa208e2f8 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -110,13 +110,14 @@ static int emit_bb_start(u64 batch_addr, u32 ppgtt_flag, u32 *dw, int i)
> >  	return i;
> >  }
> >  
> > -static int emit_flush_invalidate(u32 *dw, int i)
> > +static int emit_flush_invalidate(u32 addr, u32 val, u32 *dw, int i)
> 
> this helper is only used once and it looks almost exactly as another
> open coded sequence at the caller - emit_migration_job_gen12(), so maybe
> move this code there as-as?
> 
> >  {
> >  	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > -		  MI_FLUSH_IMM_DW | MI_FLUSH_DW_STORE_INDEX;
> > -	dw[i++] = LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR;
> > -	dw[i++] = 0;
> > +		  MI_FLUSH_IMM_DW;
> > +
> > +	dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
> >  	dw[i++] = 0;
> > +	dw[i++] = val;
> >  
> >  	return i;
> >  }
> > @@ -398,22 +399,19 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
> >  				     struct xe_lrc *lrc, u32 seqno)
> >  {
> >  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> > +	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);
> 
> please keep definitions in rev-xmas-tree order
> 
> >  
> >  	i = emit_copy_timestamp(lrc, dw, i);
> >  
> > -	i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> > -				seqno, dw, i);
> > +	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
> >  
> >  	dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again below */
> >  
> >  	i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i);
> >  
> > -	if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) {
> > -		/* XXX: Do we need this? Leaving for now. */
> > -		dw[i++] = preparser_disable(true);
> > -		i = emit_flush_invalidate(dw, i);
> > -		dw[i++] = preparser_disable(false);
> > -	}
> > +	dw[i++] = preparser_disable(true);
> > +	i = emit_flush_invalidate(saddr, seqno, dw, i);
> 
> hmm, but seqno is already stored by the above emit_store_imm_ggtt(), so
> maybe to fulfill MI_INVALIDATE_TLB requirement instead of post-sync
> IMM(1) use post-sync TIMESTAMP(3)?
> 

I'm not following this suggestion, Michal. Storing the same value twice
is harmless, and the first store is needed without an invalidate. The
second store (the store itself isn’t needed, but the invalidate is)
needs to be placed between two BBs.

Nits aside, patch looks correct to me and inclined to RB.

Matt

> > +	dw[i++] = preparser_disable(false);
> >  
> >  	i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i);
> >  
>