[PATCH 09/15] drm/xe: Convert the CPU fault handler for exhaustive eviction
Matthew Brost
matthew.brost at intel.com
Fri Aug 15 19:04:49 UTC 2025
On Fri, Aug 15, 2025 at 05:16:54PM +0200, Thomas Hellström wrote:
> On Wed, 2025-08-13 at 15:06 -0700, Matthew Brost wrote:
> > On Wed, Aug 13, 2025 at 12:51:15PM +0200, Thomas Hellström wrote:
> > > The CPU fault handler may populate bos and migrate, and in doing
> > > so might interfere with other tasks validing.
> > >
> > > Convert it for exhaustive eviction. To do this properly without
> > > potentially introducing stalls with the mmap lock held requires
> > > TTM work. In the meantime, let's live with those stalls that
> > > would typically happen on memory pressure.
> > >
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_bo.c | 11 ++++++++---
> > > 1 file changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > b/drivers/gpu/drm/xe/xe_bo.c
> > > index 5e40b6cb8d2a..dd1e0e9957e0 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -1720,14 +1720,18 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > > struct xe_device *xe = to_xe_device(ddev);
> > > struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > bool needs_rpm = bo->flags & XE_BO_FLAG_VRAM_MASK;
> > > - struct drm_exec *exec;
> > > + struct xe_validation_ctx ctx;
> > > + struct drm_exec exec;
> > > vm_fault_t ret;
> > > int idx;
> > >
> > > if (needs_rpm)
> > > xe_pm_runtime_get(xe);
> > >
> > > - exec = XE_VALIDATION_UNIMPLEMENTED;
> > > + if (xe_validation_ctx_init(&ctx, &xe->val, &exec,
> > > + DRM_EXEC_INTERRUPTIBLE_WAIT, 0,
> > > false))
> > > + return VM_FAULT_NOPAGE;
> >
> > Any particular reason to not use xe_validation_guard here?
>
> Well this is a bit complicated ATM.
> We would need some serious TTM rework here to support drm_exec in these
> helpers, and ATM I think upon closer inspection we'd need an
> xe_validation_ctx_init that doesn't initialize a drm_exec.
>
Right, so I think this is an unsupported case then.
Matt
> ttm_bo_vm_reserve() might use a bo lock without a drm_exec and that
> will cause a lockdep splat if the drm_exec transaction has initialized
> the ww ctx, which happens in drm_exec_until_all_locked().
>
> I should add a comment about that.
>
> /Thomas
>
>
>
> >
> > Matt
> >
> > > +
> > > ret = ttm_bo_vm_reserve(tbo, vmf);
> > > if (ret)
> > > goto out;
> > > @@ -1735,7 +1739,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > > if (drm_dev_enter(ddev, &idx)) {
> > > trace_xe_bo_cpu_fault(bo);
> > >
> > > - xe_validation_assert_exec(xe, exec, &tbo->base);
> > > + xe_validation_assert_exec(xe, &exec, &tbo->base);
> > > ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
> > > >vm_page_prot,
> > >
> > > TTM_BO_VM_NUM_PREFAULT);
> > > drm_dev_exit(idx);
> > > @@ -1761,6 +1765,7 @@ static vm_fault_t xe_gem_fault(struct
> > > vm_fault *vmf)
> > >
> > > dma_resv_unlock(tbo->base.resv);
> > > out:
> > > + xe_validation_ctx_fini(&ctx);
> > > if (needs_rpm)
> > > xe_pm_runtime_put(xe);
> > >
> > > --
> > > 2.50.1
> > >
>
More information about the Intel-xe
mailing list