[PATCH v1 5/5] drm/pagemap: Allocate folios when possible

Fri Jul 18 05:49:48 UTC 2025

On Thu, Jul 17, 2025 at 09:41:23PM -0700, Matthew Brost wrote:
> On Thu, Jul 17, 2025 at 03:38:27PM +0200, Francois Dugast wrote:
> > If the order is greater than zero, allocate a folio when populating the
> > RAM PFNs instead of allocating individual pages one after the other. For
> > example if 2MB folios are used instead of 4KB pages, this reduces the
> > number of calls to the allocation API by 512.
> > 
> > Signed-off-by: Francois Dugast <francois.dugast at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > ---
> >  drivers/gpu/drm/drm_pagemap.c | 33 ++++++++++++++++++++++-----------
> >  1 file changed, 22 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> > index de15d96f6393..4f67c6173ee5 100644
> > --- a/drivers/gpu/drm/drm_pagemap.c
> > +++ b/drivers/gpu/drm/drm_pagemap.c
> > @@ -438,6 +438,7 @@ EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
> >   * @src_mpfn: Source array of migrate PFNs
> >   * @mpfn: Array of migrate PFNs to populate
> >   * @addr: Start address for PFN allocation
> > + * @order: Page order
> >   *
> >   * This function populates the RAM migrate page frame numbers (PFNs) for the
> >   * specified VM area structure. It allocates and locks pages in the VM area for
> > @@ -452,35 +453,45 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
> >  						unsigned long *mpages,
> >  						unsigned long *src_mpfn,
> >  						unsigned long *mpfn,
> > -						unsigned long addr)
> > +						unsigned long addr,
> > +						unsigned int order)
> 
> I don't think an order argument is needed. A better approach would be to
> look at the order of the src_mpfn (device) page and allocate based on
> that. This would maintain congruence between the initial GPU fault—which
> creates device pages either as THP or not—and the migration path back,
> where we'd insert a THP or not accordingly. In other words, it would
> preserve consistency throughout the entire flow.
> 
> Also, if you look at the migrate_vma_* functions for THP, it's never
> allowed to upgrade from non-THP to THP—only a downgrade from THP to
> non-THP is permitted.
> 
> >  {
> >  	unsigned long i;
> >  
> > -	for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
> > +	for (i = 0; i < npages;) {
> >  		struct page *page, *src_page;
> >  
> >  		if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
> > -			continue;
> > +			goto next;
> >  
> >  		src_page = migrate_pfn_to_page(src_mpfn[i]);
> >  		if (!src_page)
> > -			continue;
> > +			goto next;
> >  
> >  		if (fault_page) {
> >  			if (src_page->zone_device_data !=
> >  			    fault_page->zone_device_data)
> > -				continue;
> > +				goto next;
> >  		}
> >  
> > -		if (vas)
> > -			page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> > -		else
> > -			page = alloc_page(GFP_HIGHUSER);
> > +		if (order) {
> > +			page = folio_page(vma_alloc_folio(GFP_HIGHUSER | __GFP_ZERO,
> > +							  order, vas, addr), 0);
> 
>                         if (vas)
>                                 page = folio_page(vma_alloc_folio(GFP_HIGHUSER,
>                                                                   order, vas, addr), 0);
>                         else
>                                 page = alloc_pages(GFP_HIGHUSER, order);
> 
> We may also want to consider a downgrade path—for example, if THP
> allocation fails here, we could fall back to allocating single pages.
> That would complicate things across GPU SVM and Xe, so maybe we table it
> for now. But eventually, we'll need to handle this as per Nvidia's
> comments, THP allocation failure seems possible.
> 
> Maybe add a comment indicating that, something like:
> 
> /* TODO: Support fallback to single pages if THP allocation fails */
> 
> > +		} else {
> > +			if (vas)
> > +				page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> > +			else
> > +				page = alloc_page(GFP_HIGHUSER);
> > +		}
> >  
> >  		if (!page)
> >  			goto free_pages;
> >  
> >  		mpfn[i] = migrate_pfn(page_to_pfn(page));
> > +
> > +next:
> > +		i += 0x1 << order;
> > +		addr += page_size(page);
> >  	}
> >  
> 
> The loops below need to be updated to loop based on order too.
> 

Also the mpages return based on order.

Matt

> Matt
> 
> >  	for (i = 0; i < npages; ++i) {
> > @@ -554,7 +565,7 @@ int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation)
> >  		goto err_free;
> >  
> >  	err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
> > -						   src, dst, 0);
> > +						   src, dst, 0, 0);
> >  	if (err || !mpages)
> >  		goto err_finalize;
> >  
> > @@ -690,7 +701,7 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
> >  
> >  	err = drm_pagemap_migrate_populate_ram_pfn(vas, page, npages, &mpages,
> >  						   migrate.src, migrate.dst,
> > -						   start);
> > +						   start, 0);
> >  	if (err)
> >  		goto err_finalize;
> >  
> > -- 
> > 2.43.0
> >