[RFC 07/11] mm/memremap: Add folio_split support
Balbir Singh
balbirs at nvidia.com
Wed Jul 9 23:34:45 UTC 2025
On 7/9/25 00:31, David Hildenbrand wrote:
> On 06.03.25 05:42, Balbir Singh wrote:
>> When a zone device page is split (via huge pmd folio split). The
>> driver callback for folio_split is invoked to let the device driver
>> know that the folio size has been split into a smaller order.
>>
>> The HMM test driver has been updated to handle the split, since the
>> test driver uses backing pages, it requires a mechanism of reorganizing
>> the backing pages (backing pages are used to create a mirror device)
>> again into the right sized order pages. This is supported by exporting
>> prep_compound_page().
>>
>> Signed-off-by: Balbir Singh <balbirs at nvidia.com>
>> ---
>> include/linux/memremap.h | 7 +++++++
>> include/linux/mm.h | 1 +
>> lib/test_hmm.c | 35 +++++++++++++++++++++++++++++++++++
>> mm/huge_memory.c | 5 +++++
>> mm/page_alloc.c | 1 +
>> 5 files changed, 49 insertions(+)
>>
>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> index 11d586dd8ef1..2091b754f1da 100644
>> --- a/include/linux/memremap.h
>> +++ b/include/linux/memremap.h
>> @@ -100,6 +100,13 @@ struct dev_pagemap_ops {
>> */
>> int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
>> unsigned long nr_pages, int mf_flags);
>> +
>> + /*
>> + * Used for private (un-addressable) device memory only.
>> + * This callback is used when a folio is split into
>> + * a smaller folio
>
> Confusing. When a folio is split, it is split into multiple folios.
>
> So when will this be invoked?
>
It is invoked when a folio splits in mm/huge_memory.c. This allows the device
driver to update any metadata it's tracking w.r.t original folio in zone_device_data
>> + */
>> + void (*folio_split)(struct folio *head, struct folio *tail);
>
> head and tail are really suboptimal termonology. They refer to head and tail pages, which is not really the case with folios (in the long run).
>
Will rename them to original_folio and new_folio if that helps with readability
>> };
>> #define PGMAP_ALTMAP_VALID (1 << 0)
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 98a67488b5fe..3d0e91e0a923 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1415,6 +1415,7 @@ static inline struct folio *virt_to_folio(const void *x)
>> void __folio_put(struct folio *folio);
>> void split_page(struct page *page, unsigned int order);
>> +void prep_compound_page(struct page *page, unsigned int order);
>> void folio_copy(struct folio *dst, struct folio *src);
>> int folio_mc_copy(struct folio *dst, struct folio *src);
>> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
>> index a81d2f8a0426..18b6a7b061d7 100644
>> --- a/lib/test_hmm.c
>> +++ b/lib/test_hmm.c
>> @@ -1640,10 +1640,45 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
>> return ret;
>> }
>> +
>> +static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
>> +{
>> + struct page *rpage = BACKING_PAGE(folio_page(head, 0));
>> + struct folio *new_rfolio;
>> + struct folio *rfolio;
>> + unsigned long offset = 0;
>> +
>> + if (!rpage) {
>> + folio_page(tail, 0)->zone_device_data = NULL;
>> + return;
>> + }
>> +
>> + offset = folio_pfn(tail) - folio_pfn(head);
>> + rfolio = page_folio(rpage);
>> + new_rfolio = page_folio(folio_page(rfolio, offset));
>> +
>> + folio_page(tail, 0)->zone_device_data = folio_page(new_rfolio, 0);
>> +
>> + if (folio_pfn(tail) - folio_pfn(head) == 1) {
>> + if (folio_order(head))
>> + prep_compound_page(folio_page(rfolio, 0),
>> + folio_order(head));
>> + folio_set_count(rfolio, 1);
>> + }
>> + clear_compound_head(folio_page(new_rfolio, 0));
>> + if (folio_order(tail))
>> + prep_compound_page(folio_page(new_rfolio, 0),
>> + folio_order(tail));
>> + folio_set_count(new_rfolio, 1);
>> + folio_page(new_rfolio, 0)->mapping = folio_page(rfolio, 0)->mapping;
>> + tail->pgmap = head->pgmap;
>
> Most of this doesn't look like it should be the responsibility of this callback.
>
> Setting up a new folio after the split (messing with compound pages etc) really should not be the responsibility of this callback.
>
> So no, this looks misplaced.
>
We do need a callback for drivers to do the right thing. In this case if you look at lib/test_hmm.c,
device pages are emulated via backing pages (real folios allocated from system memory). Hence, you
see all the changes here. I can try and simplify this going forward.
>> +}
>> +
>> static const struct dev_pagemap_ops dmirror_devmem_ops = {
>> .page_free = dmirror_devmem_free,
>> .migrate_to_ram = dmirror_devmem_fault,
>> .page_free = dmirror_devmem_free,
>> + .folio_split = dmirror_devmem_folio_split,
>> };
>> static int dmirror_device_init(struct dmirror_device *mdevice, int id)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 995ac8be5709..518a70d1b58a 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3655,6 +3655,11 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>> MTHP_STAT_NR_ANON, 1);
>> }
>> + if (folio_is_device_private(origin_folio) &&
>> + origin_folio->pgmap->ops->folio_split)
>> + origin_folio->pgmap->ops->folio_split(
>> + origin_folio, release);
>
> Absolutely ugly. I think we need a wrapper for the
>
Will do
>> +
>> /*
>> * Unfreeze refcount first. Additional reference from
>> * page cache.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 17ea8fb27cbf..563f7e39aa79 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -573,6 +573,7 @@ void prep_compound_page(struct page *page, unsigned int order)
>> prep_compound_head(page, order);
>> }
>> +EXPORT_SYMBOL_GPL(prep_compound_page);
>
> Hmmm, that is questionable. We don't want arbitrary modules to make use of that.
>
> Another sign that you are exposing the wrong functionality/interface (folio_split) to modules.
>
prep_compound_page is required for generic THP support. In our case the driver lib/test_hmm.c has no
real device pages, just actual folio pages backing it. When the split occurs, we need to ensure
the pgmap entries are correct, the mapping is right and the backing folio is set to the right order.
I tried copying the pages to new folios (but I can't allocate in the split context), I'll see
if I can get rid of this requirement.
Thanks,
Balbir Singh
More information about the dri-devel
mailing list