[PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction

Mon Aug 18 09:11:42 UTC 2025

On Fri, 2025-08-15 at 12:04 -0700, Matthew Brost wrote:
> On Fri, Aug 15, 2025 at 05:16:54PM +0200, Thomas Hellström wrote:
> > On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> > > On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > > > The CPU fault handler may populate bos and migrate, and in
> > > > doing
> > > > so might interfere with other tasks validing.
> > > > 
> > > > Convert it for exhaustive eviction. To do this properly without
> > > > potentially introducing stalls with the mmap lock held requires
> > > > TTM work. In the meantime, let's live with those stalls that
> > > > would typically happen on memory pressure.
> > > > 
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom at linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  	struct xe_device *xe = to_xe_device(ddev);
> > > >  	struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > >  	bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > > -	struct drm_exec *exec;
> > > > +	struct xe_validation_ctx ctx;
> > > > +	struct drm_exec exec;
> > > >  	vm_fault_t ret;
> > > >  	int idx;
> > > >  
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_get(xe);
> > > >  
> > > > -	exec = XE_VALIDATION_UNIMPLEMENTED;
> > > > +	if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > > > +				  
> > > > DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > > > false))
> > > > +		return VM_FAULT_NOPAGE;
> > > 
> > > Any particular reason to not use xe_validation_guard here?
> > 
> > Well this is a bit complicated ATM.
> > We would need some serious TTM rework here to support drm_exec in
> > these
> > helpers, and ATM I think upon closer inspection we'd need an
> > xe_validation_ctx_init that doesn't initialize a drm_exec.
> > 
> 
> Right, so I think this is an unsupported case then.

We should be able to re-lock in write-mode, though.
Let me have a look at this in v2.

Thanks,
Thomas

> 
> Matt
> 
> > ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
> > will cause a lockdep splat if the drm_exec transaction has
> > initialized
> > the ww ctx, which happens in drm_exec_until_all_locked(). 
> > 
> > I should add a comment about that.
> > 
> > /Thomas
> > 
> > 
> > 
> > > 
> > > Matt
> > > 
> > > > +
> > > >  	ret = ttm_bo_vm_reserve(tbo, vmf);
> > > >  	if (ret)
> > > >  		goto out;
> > > > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  	if (drm_dev_enter(ddev, &idx)) {
> > > >  		trace_xe_bo_cpu_fault(bo);
> > > >  
> > > > -		xe_validation_assert_exec(xe, exec, &tbo-
> > > > >base);
> > > > +		xe_validation_assert_exec(xe, &exec, &tbo-
> > > > >base);
> > > >  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > > > vm_page_prot,
> > > >  					      
> > > > TTM_BO_VM_NUM_PREFAULT);
> > > >  		drm_dev_exit(idx);
> > > > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > > > vm_fault *vmf)
> > > >  
> > > >  	dma_resv_unlock(tbo->base.resv);
> > > >  out:
> > > > +	xe_validation_ctx_fini(&ctx);
> > > >  	if (needs_rpm)
> > > >  		xe_pm_runtime_put(xe);
> > > >  
> > > > -- 
> > > > 2.50.1
> > > > 
> >