[PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction

Fri Aug 15 15:16:54 UTC 2025

On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > The CPU fault handler may populate bos and migrate, and in doing
> > so might interfere with other tasks validing.
> > 
> > Convert it for exhaustive eviction. To do this properly without
> > potentially introducing stalls with the mmap lock held requires
> > TTM work. In the meantime, let's live with those stalls that
> > would typically happen on memory pressure.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  	struct xe_device *xe = to_xe_device(ddev);
> >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > -	struct drm_exec *exec;
> > +	struct xe_validation_ctx ctx;
> > +	struct drm_exec exec;
> >  	vm_fault_t ret;
> >  	int idx;
> >  
> >  	if (needs_rpm)
> >  		xe_pm_runtime_get(xe);
> >  
> > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > +				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > false))
> > +		return VM_FAULT_NOPAGE;
> 
> Any particular reason to not use xe_validation_guard here?

Well this is a bit complicated ATM.
We would need some serious TTM rework here to support drm_exec in these
helpers, and ATM I think upon closer inspection we'd need an
xe_validation_ctx_init that doesn't initialize a drm_exec.

ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
will cause a lockdep splat if the drm_exec transaction has initialized
the ww ctx, which happens in drm_exec_until_all_locked(). 

I should add a comment about that.

/Thomas

> 
> Matt
> 
> > +
> >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> >  	if (ret)
> >  		goto out;
> > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  	if (drm_dev_enter(ddev, &idx)) {
> >  		trace_xe_bo_cpu_fault(bo);
> >  
> > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > +		xe_validation_assert_exec(xe, &exec, &tbo->base);
> >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > >vm_page_prot,
> >  					      
> > TTM_BO_VM_NUM_PREFAULT);
> >  		drm_dev_exit(idx);
> > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >  
> >  	dma_resv_unlock(tbo->base.resv);
> >  out:
> > +	xe_validation_ctx_fini(&ctx);
> >  	if (needs_rpm)
> >  		xe_pm_runtime_put(xe);
> >  
> > -- 
> > 2.50.1
> >