[PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
Thomas Hellström
thomas.hellstrom at linux.intel.com
Fri Aug 15 15:16:54 UTC 2025
On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > The CPU fault handler may populate bos and migrate, and in doing
> > so might interfere with other tasks validing.
> >
> > Convert it for exhaustive eviction. To do this properly without
> > potentially introducing stalls with the mmap lock held requires
> > TTM work. In the meantime, let's live with those stalls that
> > would typically happen on memory pressure.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> > struct xe_device *xe = to_xe_device(ddev);
> > struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > - struct drm_exec *exec;
> > + struct xe_validation_ctx ctx;
> > + struct drm_exec exec;
> > vm_fault_t ret;
> > int idx;
> >
> > if (needs_rpm)
> > xe_pm_runtime_get(xe);
> >
> > - exec = XE_VALIDATION_UNIMPLEMENTED;
> > + if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > + DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > false))
> > + return VM_FAULT_NOPAGE;
>
> Any particular reason to not use xe_validation_guard here?
Well this is a bit complicated ATM.
We would need some serious TTM rework here to support drm_exec in these
helpers, and ATM I think upon closer inspection we'd need an
xe_validation_ctx_init that doesn't initialize a drm_exec.
ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
will cause a lockdep splat if the drm_exec transaction has initialized
the ww ctx, which happens in drm_exec_until_all_locked().
I should add a comment about that.
/Thomas
>
> Matt
>
> > +
> > ret = ttm_bo_vm_reserve(tbo, vmf);
> > if (ret)
> > goto out;
> > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> > if (drm_dev_enter(ddev, &idx)) {
> > trace_xe_bo_cpu_fault(bo);
> >
> > - xe_validation_assert_exec(xe, exec, &tbo->base);
> > + xe_validation_assert_exec(xe, &exec, &tbo->base);
> > ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > >vm_page_prot,
> >
> > TTM_BO_VM_NUM_PREFAULT);
> > drm_dev_exit(idx);
> > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > vm_fault *vmf)
> >
> > dma_resv_unlock(tbo->base.resv);
> > out:
> > + xe_validation_ctx_fini(&ctx);
> > if (needs_rpm)
> > xe_pm_runtime_put(xe);
> >
> > --
> > 2.50.1
> >
More information about the Intel-xe
mailing list