[PATCH v1 04/15] mm: add device coherent checker to remove migration pte

Alistair Popple apopple at nvidia.com
Thu May 12 02:39:16 UTC 2022


"Sierra Guiza, Alejandro (Alex)" <Alex.Sierra at amd.com> writes:

> @apopple at nvidia.com Could you please check this patch? It's somehow related to migrate_device_page() for long term device coherent pages.
>
> Regards,
> Alex Sierra
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Alex
>> Sierra
>> Sent: Thursday, May 5, 2022 4:34 PM
>> To: jgg at nvidia.com
>> Cc: rcampbell at nvidia.com; willy at infradead.org; david at redhat.com;
>> Kuehling, Felix <Felix.Kuehling at amd.com>; apopple at nvidia.com; amd-
>> gfx at lists.freedesktop.org; linux-xfs at vger.kernel.org; linux-mm at kvack.org;
>> jglisse at redhat.com; dri-devel at lists.freedesktop.org; akpm at linux-
>> foundation.org; linux-ext4 at vger.kernel.org; hch at lst.de
>> Subject: [PATCH v1 04/15] mm: add device coherent checker to remove
>> migration pte
>>
>> During remove_migration_pte(), entries for device coherent type pages that
>> were not created through special migration ptes, ignore _PAGE_RW flag. This
>> path can be found at migrate_device_page(), where valid vma is not
>> required. In this case, migrate_vma_collect_pmd() is not called and special
>> migration ptes are not set.

It's true that we don't call migrate_vma_collect_pmd() for
migrate_device_page(), but this doesn't imply migration entries are not
created. We still call migrate_vma_unmap() which calls try_to_migrate()
to install migration entries.

When we have a vma migrate_vma_collect_pmd() is a fast path for the
common case a page is only mapped once. So migrate_vma_collect_pmd()
should fairly closely match try_to_migrate_one(). I did experiment
locally with removing the fast path to simplify the code, but it does
provide a meaningful performance improvement so I abandoned it.

I think you're running into the problem addressed by
https://lkml.kernel.org/r/20211018045247.3128058-1-apopple@nvidia.com
but for DEVICE_COHERENT pages.

Based on that I think the approach below is wrong. You should update
try_to_migrate_one() to deal with DEVICE_COHERENT pages. It would make
sense to do that as part of patch 1 in this series.

The problem is that try_to_migrate_one() assumes folio_is_zone_device()
implies it is a DEVICE_PRIVATE page due to the check in
try_to_migrate().

>> Signed-off-by: Alex Sierra <alex.sierra at amd.com>
>> ---
>>  mm/migrate.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c index
>> 6c31ee1e1c9b..e18ddee56f37 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -206,7 +206,8 @@ static bool remove_migration_pte(struct folio *folio,
>>  		 * Recheck VMA as permissions can change since migration
>> started
>>  		 */
>>  		entry = pte_to_swp_entry(*pvmw.pte);
>> -		if (is_writable_migration_entry(entry))
>> +		if (is_writable_migration_entry(entry) ||
>> +		    is_device_coherent_page(pfn_to_page(pvmw.pfn)))
>>  			pte = maybe_mkwrite(pte, vma);
>>  		else if (pte_swp_uffd_wp(*pvmw.pte))
>>  			pte = pte_mkuffd_wp(pte);
>> --
>> 2.32.0


More information about the amd-gfx mailing list