[PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction

Fri Aug 15 19:04:49 UTC 2025

On Fri, Aug 15, 2025 at 05:16:54PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > > The CPU fault handler may populate bos and migrate, and in doing
> > > so might interfere with other tasks validing.
> > > 
> > > Convert it for exhaustive eviction. To do this properly without
> > > potentially introducing stalls with the mmap lock held requires
> > > TTM work. In the meantime, let's live with those stalls that
> > > would typically happen on memory pressure.
> > > 
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  	struct xe_device *xe = to_xe_device(ddev);
> > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > -	struct drm_exec *exec;
> > > +	struct xe_validation_ctx ctx;
> > > +	struct drm_exec exec;
> > >  	vm_fault_t ret;
> > >  	int idx;
> > >  
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_get(xe);
> > >  
> > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > > +				   DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > > false))
> > > +		return VM_FAULT_NOPAGE;
> > 
> > Any particular reason to not use xe_validation_guard here?
> 
> Well this is a bit complicated ATM.
> We would need some serious TTM rework here to support drm_exec in these
> helpers, and ATM I think upon closer inspection we'd need an
> xe_validation_ctx_init that doesn't initialize a drm_exec.
> 

Right, so I think this is an unsupported case then.

Matt

> ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
> will cause a lockdep splat if the drm_exec transaction has initialized
> the ww ctx, which happens in drm_exec_until_all_locked(). 
> 
> I should add a comment about that.
> 
> /Thomas
> 
> 
> 
> > 
> > Matt
> > 
> > > +
> > >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> > >  	if (ret)
> > >  		goto out;
> > > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  	if (drm_dev_enter(ddev, &idx)) {
> > >  		trace_xe_bo_cpu_fault(bo);
> > >  
> > > -		xe_validation_assert_exec(xe, exec, &tbo->base);
> > > +		xe_validation_assert_exec(xe, &exec, &tbo->base);
> > >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > >  					      
> > > TTM_BO_VM_NUM_PREFAULT);
> > >  		drm_dev_exit(idx);
> > > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >  
> > >  	dma_resv_unlock(tbo->base.resv);
> > >  out:
> > > +	xe_validation_ctx_fini(&ctx);
> > >  	if (needs_rpm)
> > >  		xe_pm_runtime_put(xe);
> > >  
> > > -- 
> > > 2.50.1
> > > 
>