[PATCH] drm/xe: Dont skip TLB invalidations on VF

Tue Jul 8 12:38:50 UTC 2025

On 08.07.2025 11:01, Tejas Upadhyay wrote:
> Skipping TLB invalidations on VF causing unrecoverable
> faults. 

oops, my decision to drop it on VF was biased by this old comment:

	/* XXX: Do we need this? Leaving for now. */

> Probable reason for skipping TLB invalidations
> on SRIOV could be lack of support for instruction
> MI_FLUSH_DW_STORE_INDEX. 

true, this variant using GGTT is not supported on VFs

> Add back TLB flush with some
> additional handling.
> 
> Helps in resolving,
> [  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
>                 ASID: 0
>                 VFID: 0
>                 PDATA: 0x0d92
>                 Faulted Address: 0x0000000002fa0000
>                 FaultType: 0
>                 AccessType: 1
>                 FaultLevel: 0
>                 EngineClass: 3 bcs
>                 EngineInstance: 8
> [  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22
> 
> Suggested-by: Matthew Brost <matthew.brost at intel.com>
> Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if VF")
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ring_ops.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index bc1689db4cd7..ee0fa208e2f8 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -110,13 +110,14 @@ static int emit_bb_start(u64 batch_addr, u32 ppgtt_flag, u32 *dw, int i)
>  	return i;
>  }
>  
> -static int emit_flush_invalidate(u32 *dw, int i)
> +static int emit_flush_invalidate(u32 addr, u32 val, u32 *dw, int i)

this helper is only used once and it looks almost exactly as another
open coded sequence at the caller - emit_migration_job_gen12(), so maybe
move this code there as-as?

>  {
>  	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> -		  MI_FLUSH_IMM_DW | MI_FLUSH_DW_STORE_INDEX;
> -	dw[i++] = LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR;
> -	dw[i++] = 0;
> +		  MI_FLUSH_IMM_DW;
> +
> +	dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
>  	dw[i++] = 0;
> +	dw[i++] = val;
>  
>  	return i;
>  }
> @@ -398,22 +399,19 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
>  				     struct xe_lrc *lrc, u32 seqno)
>  {
>  	u32 dw[MAX_JOB_SIZE_DW], i = 0;
> +	u32 saddr = xe_lrc_start_seqno_ggtt_addr(lrc);

please keep definitions in rev-xmas-tree order

>  
>  	i = emit_copy_timestamp(lrc, dw, i);
>  
> -	i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> -				seqno, dw, i);
> +	i = emit_store_imm_ggtt(saddr, seqno, dw, i);
>  
>  	dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again below */
>  
>  	i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i);
>  
> -	if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) {
> -		/* XXX: Do we need this? Leaving for now. */
> -		dw[i++] = preparser_disable(true);
> -		i = emit_flush_invalidate(dw, i);
> -		dw[i++] = preparser_disable(false);
> -	}
> +	dw[i++] = preparser_disable(true);
> +	i = emit_flush_invalidate(saddr, seqno, dw, i);

hmm, but seqno is already stored by the above emit_store_imm_ggtt(), so
maybe to fulfill MI_INVALIDATE_TLB requirement instead of post-sync
IMM(1) use post-sync TIMESTAMP(3)?

> +	dw[i++] = preparser_disable(false);
>  
>  	i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i);
>