[PATCH v3 08/19] drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping
Matthew Brost
matthew.brost at intel.com
Fri May 30 06:29:37 UTC 2025
On Wed, May 28, 2025 at 09:00:27PM -0700, Matthew Brost wrote:
> On Thu, May 29, 2025 at 08:36:28AM +0530, Ghimiray, Himal Prasad wrote:
> >
> >
> > On 29-05-2025 04:45, Matthew Brost wrote:
> > > On Tue, May 27, 2025 at 10:09:52PM +0530, Himal Prasad Ghimiray wrote:
> > > > Introduce xe_svm_ranges_zap_ptes_in_range(), a function to zap page table
> > > > entries (PTEs) for all SVM ranges within a user-specified address range.
> > > >
> > > > Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_svm.c | 43 +++++++++++++++++++++++++++++++++++++
> > > > drivers/gpu/drm/xe/xe_svm.h | 7 ++++++
> > > > 2 files changed, 50 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> > > > index 59e73187114d..a4d53c24fcbc 100644
> > > > --- a/drivers/gpu/drm/xe/xe_svm.c
> > > > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > > > @@ -1006,6 +1006,49 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
> > > > return err;
> > > > }
> > > > +/**
> > > > + * xe_svm_ranges_zap_ptes_in_range - clear ptes of svm ranges in input range
> > > > + * @vm: Pointer to the xe_vm structure
> > > > + * @start: Start of the input range
> > > > + * @end: End of the input range
> > > > + *
> > > > + * This function removes the page table entries (PTEs) associated
> > > > + * with the svm ranges within the given input start amnd end
> > > > + *
> > > > + * Return: tile_mask for which gt's need to be tlb invalidated.
> > > > + */
> > > > +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
> > > > +{
> > > > + struct drm_gpusvm_notifier *notifier;
> > > > + struct xe_svm_range *range;
> > > > + u64 adj_start, adj_end;
> > > > + struct xe_tile *tile;
> > > > + u8 tile_mask = 0;
> > > > + u8 id;
> > > > +
> > > > + down_write(&vm->svm.gpusvm.notifier_lock);
> > >
> > > xe_svm_notifier_lock
> >
> > xe_pt_zap_ptes_range needs write_lock, whereas xe_svm_notifier_lock/unlock
> > provides read lock.
>
> Hmm, I think the assert in xe_pt_zap_ptes_range is actually wrong. I
> likely just added the in notifier assertion because that was the only
> user of it. We want to guarantee that only 1 KMD thread is issuing a zap
> or modifying the PTEs at a time.
>
> - The notifier lock in read mode guarantees that an invalidation
> from MMU notifier doesn't race here.
>
> - The VM lock in write mode guarantees no one is modifying the page
> tables.
>
> - The notifier lock in write mode guarantees no one is modifying the
> page tables and invalidation from madvise doesn't race.
>
> I think this complex condition can expressed in lockdep by:
>
> lockdep_assert(lockdep_is_held_type(notifier_lock, 0) ||
> (lockdep_is_held_type(notifier_lock, 1) &&
> lockdep_is_held_type(vm_lock, 0)));
>
> If this works, a comment explaining above is probably warrented.
>
> If the above doesn't work or we deemed this to complex, maybe it fine to
> just take the notifier lock in write mode...
>
> I suggest we get another opinion here, perhaps from Thomas.
>
> Matt
>
Actually, this locking is incorrect for another reason as well — the SVM
notifier lock needs to be held from the start of the zap until the TLB
invalidation completes. The reason is that an MMU notifier could race by
seeing tile_invalidated set, skipping the invalidation, returning, and
moving the CPU pages before the GPU has actually stopped accessing them.
Similarly, the same race condition exists for userptr and BOs being
moved. So, for each invalidation, we need to lock all dma-resv of the
BOs being invalidated, as well as the notifiers.
Therefore, I think invalidations need to be moved directly after calling
the vfunc that sets the property, using a DRM exec loop to lock all
dma-resv of the BOs in the VMA list while we have it, then take the
notifier locks, and finally issue the zap and invalidation.
All my previous replies to this patch stand too.
Matt
> > >
> > > > +
> > > > + drm_gpusvm_for_each_notifier(notifier, &vm->svm.gpusvm, start, end) {
> > > > + struct drm_gpusvm_range *r = NULL;
> > > > +
> > > > + adj_start = max(start, notifier->itree.start);
> > > > + adj_end = min(end, notifier->itree.last + 1);
> > > > + drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end) {
> > > > + range = to_xe_range(r);
> > > > + for_each_tile(tile, vm->xe, id) {
> > > > + if (xe_pt_zap_ptes_range(tile, vm, range)) {
> > > > + tile_mask |= BIT(id);
> > > > + range->tile_invalidated |= BIT(id);
> > > > + }
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > + up_write(&vm->svm.gpusvm.notifier_lock);
> > > > +
> > >
> > > xe_svm_notifier_unlock
> > >
> > > Matt
> > >
> > > > + return tile_mask;
> > > > +}
> > > > +
> > > > #if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > > > static struct drm_pagemap_device_addr
> > > > diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> > > > index 19ce4f2754a7..af8f285b6caa 100644
> > > > --- a/drivers/gpu/drm/xe/xe_svm.h
> > > > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > > > @@ -91,6 +91,7 @@ bool xe_svm_range_validate(struct xe_vm *vm,
> > > > u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end, struct xe_vma *vma);
> > > > +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end);
> > > > /**
> > > > * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
> > > > * @range: SVM range
> > > > @@ -305,6 +306,12 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 addr, u64 end, struct xe_vma *vm
> > > > return ULONG_MAX;
> > > > }
> > > > +static inline
> > > > +u8 xe_svm_ranges_zap_ptes_in_range(struct xe_vm *vm, u64 start, u64 end)
> > > > +{
> > > > + return 0;
> > > > +}
> > > > +
> > > > #define xe_svm_assert_in_notifier(...) do {} while (0)
> > > > #define xe_svm_range_has_dma_mapping(...) false
> > > > --
> > > > 2.34.1
> > > >
> >
More information about the Intel-xe
mailing list