[v2 31/31] drm/xe/svm: Migration from sram to vram for system allocator

Fri Jun 7 17:22:42 UTC 2024

Hi Matt,

> -----Original Message-----
> From: Brost, Matthew <matthew.brost at intel.com>
> Sent: Wednesday, April 10, 2024 10:55 PM
> To: Zeng, Oak <oak.zeng at intel.com>
> Cc: intel-xe at lists.freedesktop.org; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray at intel.com>; Bommu, Krishnaiah
> <krishnaiah.bommu at intel.com>; Thomas.Hellstrom at linux.intel.com; Welty,
> Brian <brian.welty at intel.com>
> Subject: Re: [v2 31/31] drm/xe/svm: Migration from sram to vram for system
> allocator
> 
> On Tue, Apr 09, 2024 at 04:17:42PM -0400, Oak Zeng wrote:
> > If applicable, migrate a vma from sram to vram for system allocator.
> > Traditional userptr is not migrated. Only userptr created during
> > fault (aka userptr splitted from system allocator vma) can be
> > migrated.
> >
> > FIXME: The migration should be conditional on user memory attributes
> > setting. Add this logic when memory attributes are supported
> >
> > Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt_pagefault.c | 9 ++++++++-
> >  drivers/gpu/drm/xe/xe_vm.c           | 4 ----
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index 668984f0769e..c6ba00049964 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -20,6 +20,7 @@
> >  #include "xe_guc_ct.h"
> >  #include "xe_migrate.h"
> >  #include "xe_trace.h"
> > +#include "xe_svm.h"
> >  #include "xe_vm.h"
> >
> >  struct pagefault {
> > @@ -209,12 +210,18 @@ static int handle_pagefault(struct xe_gt *gt,
> struct pagefault *pf)
> >
> >  	if (xe_vma_is_userptr(vma) && write_locked) {
> >  		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
> > +		struct xe_userptr *userptr = &uvma->userptr;
> >
> >  		spin_lock(&vm->userptr.invalidated_lock);
> > -		list_del_init(&uvma->userptr.invalidate_link);
> > +		list_del_init(&userptr->invalidate_link);
> >  		spin_unlock(&vm->userptr.invalidated_lock);
> >
> > +		mmap_read_lock(userptr->notifier.mm);
> > +		/**FIXME: Add migration policy here*/
> > +		if (xe_vma_is_fault_userptr(vma))
> > +			xe_svm_migrate_vma_to_vram(vm, vma, tile);
> 
> Agree we need a policy here...
> 
> See my comments about locking in [1] thinking if we migrate we likely
> want to hold the mmap lock until at least the bind being issued to
> prevent races with the CPU fault handler, at least initially.

As explained in [1], I will continue to use mmap read lock for now. And the read lock only covers migration and userptr-pin-pages but not the bind. The bind correctness is guaranteed by a retry mechanism and a userptr's notifier_lock.

> 
> [1] https://patchwork.freedesktop.org/patch/588542/?series=132229&rev=1
> 
> >  		ret = xe_vma_userptr_pin_pages(uvma);
> > +		mmap_read_unlock(userptr->notifier.mm);
> >  		if (ret)
> >  			goto unlock_vm;
> >
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 498b36469d00..8a58fe144a02 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -71,16 +71,12 @@ int xe_vma_userptr_pin_pages(struct
> xe_userptr_vma *uvma)
> >  	struct xe_vma *vma = &uvma->vma;
> >  	struct xe_vm *vm = xe_vma_vm(vma);
> >  	struct xe_device *xe = vm->xe;
> > -	struct xe_userptr *userptr;
> >  	int ret;
> >
> >  	lockdep_assert_held(&vm->lock);
> >  	xe_assert(xe, xe_vma_is_userptr(vma));
> >
> > -	userptr = &uvma->userptr;
> > -	mmap_read_lock(userptr->notifier.mm);
> >  	ret = xe_userptr_populate_range(uvma);
> > -	mmap_read_unlock(userptr->notifier.mm);
> 
> Now you won't have the lock here other callers of this function...
> Probably need to have locked / unlocked version or arguments here.

This is addressed in v3.

Oak

> 
> Matt
> 
> >
> >  	return ret;
> >  }
> > --
> > 2.26.3
> >