[PATCH v2 02/29] mm/migrate: Add migrate_device_prepopulated_range
Alistair Popple
apopple at nvidia.com
Thu Oct 17 03:21:13 UTC 2024
Matthew Brost <matthew.brost at intel.com> writes:
> On Thu, Oct 17, 2024 at 12:49:55PM +1100, Alistair Popple wrote:
>>
>> Matthew Brost <matthew.brost at intel.com> writes:
>>
>> > On Wed, Oct 16, 2024 at 04:46:52AM +0000, Matthew Brost wrote:
>> >> On Wed, Oct 16, 2024 at 03:04:06PM +1100, Alistair Popple wrote:
>>
>> [...]
>>
>> >> > > +{
>> >> > > + unsigned long i;
>> >> > > +
>> >> > > + for (i = 0; i < npages; i++) {
>> >> > > + struct page *page = pfn_to_page(src_pfns[i]);
>> >> > > +
>> >> > > + if (!get_page_unless_zero(page)) {
>> >> > > + src_pfns[i] = 0;
>> >> > > + continue;
>> >> > > + }
>> >> > > +
>> >> > > + if (!trylock_page(page)) {
>> >> > > + src_pfns[i] = 0;
>> >> > > + put_page(page);
>> >> > > + continue;
>> >> > > + }
>> >> > > +
>> >> > > + src_pfns[i] = migrate_pfn(src_pfns[i]) | MIGRATE_PFN_MIGRATE;
>> >> >
>> >> > This needs to be converted to use a folio like
>> >> > migrate_device_range(). But more importantly this should be split out as
>> >> > a function that both migrate_device_range() and this function can call
>> >> > given this bit is identical.
>> >> >
>> >>
>> >> Missed the folio conversion and agree a helper shared between this
>> >> function and migrate_device_range would be a good idea. Let add that.
>> >>
>> >
>> > Alistair,
>> >
>> > Ok, I think now I want to go slightly different direction here to give
>> > GPUSVM a bit more control over several eviction scenarios.
>> >
>> > What if I exported the helper discussed above, e.g.,
>> >
>> > 905 unsigned long migrate_device_pfn_lock(unsigned long pfn)
>> > 906 {
>> > 907 struct folio *folio;
>> > 908
>> > 909 folio = folio_get_nontail_page(pfn_to_page(pfn));
>> > 910 if (!folio)
>> > 911 return 0;
>> > 912
>> > 913 if (!folio_trylock(folio)) {
>> > 914 folio_put(folio);
>> > 915 return 0;
>> > 916 }
>> > 917
>> > 918 return migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
>> > 919 }
>> > 920 EXPORT_SYMBOL(migrate_device_pfn_lock);
>> >
>> > And then also export migrate_device_unmap.
>> >
>> > The usage here would be let a driver collect the device pages in virtual
>> > address range via hmm_range_fault, lock device pages under notifier
>> > lock ensuring device pages are valid, drop the notifier lock and call
>> > migrate_device_unmap.
>>
>> I'm still working through this series but that seems a bit dubious, the
>> locking here is pretty subtle and easy to get wrong so seeing some code
>> would help me a lot in understanding what you're suggesting.
>>
>
> For sure locking in tricky, my mistake on not working through this
> before sending out the next rev but it came to mind after sending +
> regarding some late feedback from Thomas about using hmm for eviction
> [2]. His suggestion of using hmm_range_fault to trigger migration
> doesn't work for coherent pages, while something like below does.
>
> [2] https://patchwork.freedesktop.org/patch/610957/?series=137870&rev=1#comment_1125461
>
> Here is a snippet I have locally which seems to work.
>
> 2024 retry:
> 2025 hmm_range.notifier_seq = mmu_interval_read_begin(notifier);
> 2026 hmm_range.hmm_pfns = src;
> 2027
> 2028 while (true) {
> 2029 mmap_read_lock(mm);
> 2030 err = hmm_range_fault(&hmm_range);
> 2031 mmap_read_unlock(mm);
> 2032 if (err == -EBUSY) {
> 2033 if (time_after(jiffies, timeout))
> 2034 break;
> 2035
> 2036 hmm_range.notifier_seq = mmu_interval_read_begin(notifier);
> 2037 continue;
> 2038 }
> 2039 break;
> 2040 }
> 2041 if (err)
> 2042 goto err_put;
> 2043
> 2044 drm_gpusvm_notifier_lock(gpusvm);
> 2045 if (mmu_interval_read_retry(notifier, hmm_range.notifier_seq)) {
> 2046 drm_gpusvm_notifier_unlock(gpusvm);
> 2047 memset(src, 0, sizeof(*src) * npages);
> 2048 goto retry;
> 2049 }
> 2050 for (i = 0; i < npages; ++i) {
> 2051 struct page *page = hmm_pfn_to_page(src[i]);
> 2052
> 2053 if (page && (is_device_private_page(page) ||
> 2054 is_device_coherent_page(page)) && page->zone_device_data)
> 2055 src[i] = src[i] & ~HMM_PFN_FLAGS;
> 2056 else
> 2057 src[i] = 0;
> 2058 if (src[i])
> 2059 src[i] = migrate_device_pfn_lock(src[i]);
> 2060 }
> 2061 drm_gpusvm_notifier_unlock(gpusvm);
Practically for eviction isn't this much the same as calling
migrate_vma_setup()? And also for eviction as Sima mentioned you
probably shouldn't be looking at mm/vma structs.
> 2063 migrate_device_unmap(src, npages, NULL);
> ...
> 2101 migrate_device_pages(src, dst, npages);
> 2102 migrate_device_finalize(src, dst, npages);
>
>
>> > Sima has strongly suggested avoiding a CPUVMA
>> > lookup during eviction cases and this would let me fixup
>> > drm_gpusvm_range_evict in [1] to avoid this.
>>
>> That sounds reasonable but for context do you have a link to the
>> comments/discussion on this? I couldn't readily find it, but I may have
>> just missed it.
>>
>
> See in [4], search for '2. eviction' comment from sima.
Thanks for pointing that out. For reference here's Sima's comment:
> 2. eviction
>
> Requirements much like migrate_to_ram, because otherwise we break the
> migration gurantee:
>
> - Only looking at physical memory datastructures and locks, no looking at
> mm/vma structs or relying on those being locked. We rely entirely on
> reverse maps from try_to_migrate to find all the mappings on both cpu
> and gpu side (cpu only zone device swap or migration pte entries ofc).
I also very much agree with this. That's basically why I added
migrate_device_range(), so that we can forcibly evict pages when the
driver needs them freed (eg. driver unload, low memory, etc.). In
general it is impossible to guarantee eviction og all pages using just
hmm_range_fault().
> [3] https://patchwork.freedesktop.org/patch/610957/?series=137870&rev=1#comment_1110726
> [4] https://lore.kernel.org/all/BYAPR11MB3159A304925168D8B6B4671292692@BYAPR11MB3159.namprd11.prod.outlook.com/T/#m89cd6a37778ba5271d5381ebeb03e1f963856a78
>
>> > It would also make the function exported in this patch unnecessary too
>> > as non-contiguous pfns can be setup on driver side via
>> > migrate_device_pfn_lock and then migrate_device_unmap can be called.
>> > This also another eviction usage in GPUSVM, see drm_gpusvm_evict_to_ram
>> > in [1].
>> >
>> > Do you see an issue exporting migrate_device_pfn_lock,
>> > migrate_device_unmap?
>>
>> If there is a good justification for it I can't see a problem with
>> exporting it. That said I don't really understand why you would
>> want/need to split those steps up but I'll wait to see the code.
>>
>
> It is so the device pages returned from hmm_range_fault, which are only
> guaranteed to be valid under the notifier lock + a seqno check, to be
> locked and ref taken for migration. migrate_device_unmap() can trigger a
> MMU invalidation which takes the notifier lock thus calling the function
> which combines migrate_device_pfn_lock + migrate_device_unmap deadlocks.
>
> I think this flow makes sense and agree in general this likely better
> than looking at a CPUVMA.
I'm still a bit confused about what is better with this flow if you are
still calling hmm_range_fault(). How is it better than just calling
migrate_vma_setup()? Obviously it will fault the pages in, but it seems
we're talking about eviction here so I don't understand why that would
be relevant. And hmm_range_fault() still requires the VMA, although I
need to look at the patches more closely, probably CPUVMA is a DRM
specific concept?
Thanks.
- Alistair
> Matt
>
>> - Alistair
>>
>> > Matt
>> >
>> > [1] https://patchwork.freedesktop.org/patch/619809/?series=137870&rev=2
>> >
>> >> Matt
>> >>
>> >> > > + }
>> >> > > +
>> >> > > + migrate_device_unmap(src_pfns, npages, NULL);
>> >> > > +
>> >> > > + return 0;
>> >> > > +}
>> >> > > +EXPORT_SYMBOL(migrate_device_prepopulated_range);
>> >> > > +
>> >> > > /*
>> >> > > * Migrate a device coherent folio back to normal memory. The caller should have
>> >> > > * a reference on folio which will be copied to the new folio if migration is
>> >> >
>>
More information about the Intel-xe
mailing list