[PATCH v1 5/5] drm/pagemap: Allocate folios when possible
Matthew Brost
matthew.brost at intel.com
Sun Jul 20 20:53:29 UTC 2025
On Thu, Jul 17, 2025 at 10:49:48PM -0700, Matthew Brost wrote:
> On Thu, Jul 17, 2025 at 09:41:23PM -0700, Matthew Brost wrote:
> > On Thu, Jul 17, 2025 at 03:38:27PM +0200, Francois Dugast wrote:
> > > If the order is greater than zero, allocate a folio when populating the
> > > RAM PFNs instead of allocating individual pages one after the other. For
> > > example if 2MB folios are used instead of 4KB pages, this reduces the
> > > number of calls to the allocation API by 512.
> > >
> > > Signed-off-by: Francois Dugast <francois.dugast at intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > > drivers/gpu/drm/drm_pagemap.c | 33 ++++++++++++++++++++++-----------
> > > 1 file changed, 22 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> > > index de15d96f6393..4f67c6173ee5 100644
> > > --- a/drivers/gpu/drm/drm_pagemap.c
> > > +++ b/drivers/gpu/drm/drm_pagemap.c
> > > @@ -438,6 +438,7 @@ EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
> > > * @src_mpfn: Source array of migrate PFNs
> > > * @mpfn: Array of migrate PFNs to populate
> > > * @addr: Start address for PFN allocation
> > > + * @order: Page order
> > > *
> > > * This function populates the RAM migrate page frame numbers (PFNs) for the
> > > * specified VM area structure. It allocates and locks pages in the VM area for
> > > @@ -452,35 +453,45 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
> > > unsigned long *mpages,
> > > unsigned long *src_mpfn,
> > > unsigned long *mpfn,
> > > - unsigned long addr)
> > > + unsigned long addr,
> > > + unsigned int order)
> >
> > I don't think an order argument is needed. A better approach would be to
> > look at the order of the src_mpfn (device) page and allocate based on
> > that. This would maintain congruence between the initial GPU fault—which
> > creates device pages either as THP or not—and the migration path back,
> > where we'd insert a THP or not accordingly. In other words, it would
> > preserve consistency throughout the entire flow.
> >
> > Also, if you look at the migrate_vma_* functions for THP, it's never
> > allowed to upgrade from non-THP to THP—only a downgrade from THP to
> > non-THP is permitted.
> >
> > > {
> > > unsigned long i;
> > >
> > > - for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
> > > + for (i = 0; i < npages;) {
> > > struct page *page, *src_page;
> > >
> > > if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
> > > - continue;
> > > + goto next;
> > >
> > > src_page = migrate_pfn_to_page(src_mpfn[i]);
> > > if (!src_page)
> > > - continue;
> > > + goto next;
> > >
> > > if (fault_page) {
> > > if (src_page->zone_device_data !=
> > > fault_page->zone_device_data)
> > > - continue;
> > > + goto next;
> > > }
> > >
> > > - if (vas)
> > > - page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> > > - else
> > > - page = alloc_page(GFP_HIGHUSER);
> > > + if (order) {
> > > + page = folio_page(vma_alloc_folio(GFP_HIGHUSER | __GFP_ZERO,
> > > + order, vas, addr), 0);
> >
> > if (vas)
> > page = folio_page(vma_alloc_folio(GFP_HIGHUSER,
> > order, vas, addr), 0);
> > else
> > page = alloc_pages(GFP_HIGHUSER, order);
s/alloc_pages/folio_alloc actually.
Also I think calling folio_page before checking the return of
vma_alloc_folio is dangerous too as vma_alloc_folio can return NULL on
failure.
Matt
> >
> > We may also want to consider a downgrade path—for example, if THP
> > allocation fails here, we could fall back to allocating single pages.
> > That would complicate things across GPU SVM and Xe, so maybe we table it
> > for now. But eventually, we'll need to handle this as per Nvidia's
> > comments, THP allocation failure seems possible.
> >
> > Maybe add a comment indicating that, something like:
> >
> > /* TODO: Support fallback to single pages if THP allocation fails */
> >
> > > + } else {
> > > + if (vas)
> > > + page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> > > + else
> > > + page = alloc_page(GFP_HIGHUSER);
> > > + }
> > >
> > > if (!page)
> > > goto free_pages;
> > >
> > > mpfn[i] = migrate_pfn(page_to_pfn(page));
> > > +
> > > +next:
> > > + i += 0x1 << order;
> > > + addr += page_size(page);
> > > }
> > >
> >
> > The loops below need to be updated to loop based on order too.
> >
>
> Also the mpages return based on order.
>
> Matt
>
> > Matt
> >
> > > for (i = 0; i < npages; ++i) {
> > > @@ -554,7 +565,7 @@ int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation)
> > > goto err_free;
> > >
> > > err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
> > > - src, dst, 0);
> > > + src, dst, 0, 0);
> > > if (err || !mpages)
> > > goto err_finalize;
> > >
> > > @@ -690,7 +701,7 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
> > >
> > > err = drm_pagemap_migrate_populate_ram_pfn(vas, page, npages, &mpages,
> > > migrate.src, migrate.dst,
> > > - start);
> > > + start, 0);
> > > if (err)
> > > goto err_finalize;
> > >
> > > --
> > > 2.43.0
> > >
More information about the Intel-xe
mailing list