[PATCH v2 02/29] mm/migrate: Add migrate_device_prepopulated_range

Matthew Brost matthew.brost at intel.com
Thu Oct 17 00:56:09 UTC 2024


On Wed, Oct 16, 2024 at 04:46:52AM +0000, Matthew Brost wrote:
> On Wed, Oct 16, 2024 at 03:04:06PM +1100, Alistair Popple wrote:
> > 
> > Matthew Brost <matthew.brost at intel.com> writes:
> > 
> > > Add migrate_device_prepoluated_range which prepares an array of
> > > pre-populated device pages for migration.
> > 
> > It would be nice if the commit message could also include an explanation
> > of why the existing migrate_device_range() is inadequate for your needs.
> > 
> 
> Yea, my bad. It should explain this is required for non-contiguous
> device pages. I suppose I could always iterate for contiguous regions
> with migrate_device_range too if you think that is better.
> 
> > > v2:
> > >  - s/migrate_device_vma_range/migrate_device_prepopulated_range
> > >  - Drop extra mmu invalidation (Vetter)
> > >
> > > Cc: Andrew Morton <akpm at linux-foundation.org>
> > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > >  include/linux/migrate.h |  1 +
> > >  mm/migrate_device.c     | 35 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 36 insertions(+)
> > >
> > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > index 002e49b2ebd9..9146ed39a2a3 100644
> > > --- a/include/linux/migrate.h
> > > +++ b/include/linux/migrate.h
> > > @@ -229,6 +229,7 @@ void migrate_vma_pages(struct migrate_vma *migrate);
> > >  void migrate_vma_finalize(struct migrate_vma *migrate);
> > >  int migrate_device_range(unsigned long *src_pfns, unsigned long start,
> > >  			unsigned long npages);
> > > +int migrate_device_prepopulated_range(unsigned long *src_pfns, unsigned long npages);
> > >  void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns,
> > >  			unsigned long npages);
> > >  void migrate_device_finalize(unsigned long *src_pfns,
> > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> > > index 9cf26592ac93..f163c2131022 100644
> > > --- a/mm/migrate_device.c
> > > +++ b/mm/migrate_device.c
> > > @@ -924,6 +924,41 @@ int migrate_device_range(unsigned long *src_pfns, unsigned long start,
> > >  }
> > >  EXPORT_SYMBOL(migrate_device_range);
> > >  
> > > +/**
> > > + * migrate_device_prepopulated_range() - migrate device private pfns to normal memory.
> > > + * @src_pfns: pre-popluated array of source device private pfns to migrate.
> > > + * @npages: number of pages to migrate.
> > > + *
> > > + * Similar to migrate_device_range() but supports non-contiguous pre-popluated
> > > + * array of device pages to migrate.
> > > + */
> > > +int migrate_device_prepopulated_range(unsigned long *src_pfns, unsigned long npages)
> > 
> > I don't love the name, I think because it is quite long and makes me
> > think of something complicated like prefaulting. Perhaps
> > migrate_device_pfns()?
> > 
> 
> Sure.
> 
> > > +{
> > > +	unsigned long i;
> > > +
> > > +	for (i = 0; i < npages; i++) {
> > > +		struct page *page = pfn_to_page(src_pfns[i]);
> > > +
> > > +		if (!get_page_unless_zero(page)) {
> > > +			src_pfns[i] = 0;
> > > +			continue;
> > > +		}
> > > +
> > > +		if (!trylock_page(page)) {
> > > +			src_pfns[i] = 0;
> > > +			put_page(page);
> > > +			continue;
> > > +		}
> > > +
> > > +		src_pfns[i] = migrate_pfn(src_pfns[i]) | MIGRATE_PFN_MIGRATE;
> > 
> > This needs to be converted to use a folio like
> > migrate_device_range(). But more importantly this should be split out as
> > a function that both migrate_device_range() and this function can call
> > given this bit is identical.
> > 
> 
> Missed the folio conversion and agree a helper shared between this
> function and migrate_device_range would be a good idea. Let add that.
> 

Alistair,

Ok, I think now I want to go slightly different direction here to give
GPUSVM a bit more control over several eviction scenarios.

What if I exported the helper discussed above, e.g.,

 905 unsigned long migrate_device_pfn_lock(unsigned long pfn)
 906 {
 907         struct folio *folio;
 908
 909         folio = folio_get_nontail_page(pfn_to_page(pfn));
 910         if (!folio)
 911                 return 0;
 912
 913         if (!folio_trylock(folio)) {
 914                 folio_put(folio);
 915                 return 0;
 916         }
 917
 918         return migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
 919 }
 920 EXPORT_SYMBOL(migrate_device_pfn_lock);

And then also export migrate_device_unmap.

The usage here would be let a driver collect the device pages in virtual
address range via hmm_range_fault, lock device pages under notifier
lock ensuring device pages are valid, drop the notifier lock and call
migrate_device_unmap. Sima has strongly suggested avoiding a CPUVMA
lookup during eviction cases and this would let me fixup
drm_gpusvm_range_evict in [1] to avoid this.

It would also make the function exported in this patch unnecessary too
as non-contiguous pfns can be setup on driver side via
migrate_device_pfn_lock and then migrate_device_unmap can be called.
This also another eviction usage in GPUSVM, see drm_gpusvm_evict_to_ram
in [1].

Do you see an issue exporting migrate_device_pfn_lock,
migrate_device_unmap?

Matt

[1] https://patchwork.freedesktop.org/patch/619809/?series=137870&rev=2

> Matt
> 
> > > +	}
> > > +
> > > +	migrate_device_unmap(src_pfns, npages, NULL);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL(migrate_device_prepopulated_range);
> > > +
> > >  /*
> > >   * Migrate a device coherent folio back to normal memory. The caller should have
> > >   * a reference on folio which will be copied to the new folio if migration is
> > 


More information about the Intel-xe mailing list