[PATCH v2 02/29] mm/migrate: Add migrate_device_prepopulated_range
Matthew Brost
matthew.brost at intel.com
Thu Oct 17 00:56:09 UTC 2024
On Wed, Oct 16, 2024 at 04:46:52AM +0000, Matthew Brost wrote:
> On Wed, Oct 16, 2024 at 03:04:06PM +1100, Alistair Popple wrote:
> >
> > Matthew Brost <matthew.brost at intel.com> writes:
> >
> > > Add migrate_device_prepoluated_range which prepares an array of
> > > pre-populated device pages for migration.
> >
> > It would be nice if the commit message could also include an explanation
> > of why the existing migrate_device_range() is inadequate for your needs.
> >
>
> Yea, my bad. It should explain this is required for non-contiguous
> device pages. I suppose I could always iterate for contiguous regions
> with migrate_device_range too if you think that is better.
>
> > > v2:
> > > - s/migrate_device_vma_range/migrate_device_prepopulated_range
> > > - Drop extra mmu invalidation (Vetter)
> > >
> > > Cc: Andrew Morton <akpm at linux-foundation.org>
> > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > > include/linux/migrate.h | 1 +
> > > mm/migrate_device.c | 35 +++++++++++++++++++++++++++++++++++
> > > 2 files changed, 36 insertions(+)
> > >
> > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > index 002e49b2ebd9..9146ed39a2a3 100644
> > > --- a/include/linux/migrate.h
> > > +++ b/include/linux/migrate.h
> > > @@ -229,6 +229,7 @@ void migrate_vma_pages(struct migrate_vma *migrate);
> > > void migrate_vma_finalize(struct migrate_vma *migrate);
> > > int migrate_device_range(unsigned long *src_pfns, unsigned long start,
> > > unsigned long npages);
> > > +int migrate_device_prepopulated_range(unsigned long *src_pfns, unsigned long npages);
> > > void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns,
> > > unsigned long npages);
> > > void migrate_device_finalize(unsigned long *src_pfns,
> > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> > > index 9cf26592ac93..f163c2131022 100644
> > > --- a/mm/migrate_device.c
> > > +++ b/mm/migrate_device.c
> > > @@ -924,6 +924,41 @@ int migrate_device_range(unsigned long *src_pfns, unsigned long start,
> > > }
> > > EXPORT_SYMBOL(migrate_device_range);
> > >
> > > +/**
> > > + * migrate_device_prepopulated_range() - migrate device private pfns to normal memory.
> > > + * @src_pfns: pre-popluated array of source device private pfns to migrate.
> > > + * @npages: number of pages to migrate.
> > > + *
> > > + * Similar to migrate_device_range() but supports non-contiguous pre-popluated
> > > + * array of device pages to migrate.
> > > + */
> > > +int migrate_device_prepopulated_range(unsigned long *src_pfns, unsigned long npages)
> >
> > I don't love the name, I think because it is quite long and makes me
> > think of something complicated like prefaulting. Perhaps
> > migrate_device_pfns()?
> >
>
> Sure.
>
> > > +{
> > > + unsigned long i;
> > > +
> > > + for (i = 0; i < npages; i++) {
> > > + struct page *page = pfn_to_page(src_pfns[i]);
> > > +
> > > + if (!get_page_unless_zero(page)) {
> > > + src_pfns[i] = 0;
> > > + continue;
> > > + }
> > > +
> > > + if (!trylock_page(page)) {
> > > + src_pfns[i] = 0;
> > > + put_page(page);
> > > + continue;
> > > + }
> > > +
> > > + src_pfns[i] = migrate_pfn(src_pfns[i]) | MIGRATE_PFN_MIGRATE;
> >
> > This needs to be converted to use a folio like
> > migrate_device_range(). But more importantly this should be split out as
> > a function that both migrate_device_range() and this function can call
> > given this bit is identical.
> >
>
> Missed the folio conversion and agree a helper shared between this
> function and migrate_device_range would be a good idea. Let add that.
>
Alistair,
Ok, I think now I want to go slightly different direction here to give
GPUSVM a bit more control over several eviction scenarios.
What if I exported the helper discussed above, e.g.,
905 unsigned long migrate_device_pfn_lock(unsigned long pfn)
906 {
907 struct folio *folio;
908
909 folio = folio_get_nontail_page(pfn_to_page(pfn));
910 if (!folio)
911 return 0;
912
913 if (!folio_trylock(folio)) {
914 folio_put(folio);
915 return 0;
916 }
917
918 return migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
919 }
920 EXPORT_SYMBOL(migrate_device_pfn_lock);
And then also export migrate_device_unmap.
The usage here would be let a driver collect the device pages in virtual
address range via hmm_range_fault, lock device pages under notifier
lock ensuring device pages are valid, drop the notifier lock and call
migrate_device_unmap. Sima has strongly suggested avoiding a CPUVMA
lookup during eviction cases and this would let me fixup
drm_gpusvm_range_evict in [1] to avoid this.
It would also make the function exported in this patch unnecessary too
as non-contiguous pfns can be setup on driver side via
migrate_device_pfn_lock and then migrate_device_unmap can be called.
This also another eviction usage in GPUSVM, see drm_gpusvm_evict_to_ram
in [1].
Do you see an issue exporting migrate_device_pfn_lock,
migrate_device_unmap?
Matt
[1] https://patchwork.freedesktop.org/patch/619809/?series=137870&rev=2
> Matt
>
> > > + }
> > > +
> > > + migrate_device_unmap(src_pfns, npages, NULL);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(migrate_device_prepopulated_range);
> > > +
> > > /*
> > > * Migrate a device coherent folio back to normal memory. The caller should have
> > > * a reference on folio which will be copied to the new folio if migration is
> >
More information about the Intel-xe
mailing list