[PATCH] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v3)

David Hildenbrand david at redhat.com
Tue Nov 14 09:23:57 UTC 2023


On 14.11.23 08:00, Vivek Kasireddy wrote:
> For drivers that would like to longterm-pin the pages associated
> with a file, the pin_user_pages_fd() API provides an option to
> not only pin the pages via FOLL_PIN but also to check and migrate
> them if they reside in movable zone or CMA block. This API
> currently works with files that belong to either shmem or hugetlbfs.
> Files belonging to other filesystems are rejected for now.
> 
> The pages need to be located first before pinning them via FOLL_PIN.
> If they are found in the page cache, they can be immediately pinned.
> Otherwise, they need to be allocated using the filesystem specific
> APIs and then pinned.
> 
> v2:
> - Drop gup_flags and improve comments and commit message (David)
> - Allocate a page if we cannot find in page cache for the hugetlbfs
>    case as well (David)
> - Don't unpin pages if there is a migration related failure (David)
> - Drop the unnecessary nr_pages <= 0 check (Jason)
> - Have the caller of the API pass in file * instead of fd (Jason)
> 
> v3: (David)
> - Enclose the huge page allocation code with #ifdef CONFIG_HUGETLB_PAGE
>    (Build error reported by kernel test robot <lkp at intel.com>)
> - Don't forget memalloc_pin_restore() on non-migration related errors
> - Improve the readability of the cleanup code associated with
>    non-migration related errors
> - Augment the comments by describing FOLL_LONGTERM like behavior
> - Include the R-b tag from Jason
> 
> Cc: David Hildenbrand <david at redhat.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Mike Kravetz <mike.kravetz at oracle.com>
> Cc: Hugh Dickins <hughd at google.com>
> Cc: Peter Xu <peterx at redhat.com>
> Cc: Gerd Hoffmann <kraxel at redhat.com>
> Cc: Dongwon Kim <dongwon.kim at intel.com>
> Cc: Junxiao Chang <junxiao.chang at intel.com>
> Suggested-by: Jason Gunthorpe <jgg at nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg at nvidia.com> (v2)
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> ---
>   include/linux/mm.h |   2 +
>   mm/gup.c           | 109 +++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 111 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 418d26608ece..1b675fa35059 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
>   long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
> +long pin_user_pages_fd(struct file *file, pgoff_t start,
> +		       unsigned long nr_pages, struct page **pages);
>   
>   int get_user_pages_fast(unsigned long start, int nr_pages,
>   			unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index 231711efa390..b3af967cdff1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3410,3 +3410,112 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   				     &locked, gup_flags);
>   }
>   EXPORT_SYMBOL(pin_user_pages_unlocked);
> +
> +static struct page *alloc_file_page(struct file *file, pgoff_t idx)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> +	struct page *page = ERR_PTR(-ENOMEM);
> +	struct folio *folio;
> +	int err;
> +
> +	if (is_file_hugepages(file)) {
> +		folio = alloc_hugetlb_folio_nodemask(hstate_file(file),
> +						     NUMA_NO_NODE,
> +						     NULL,
> +						     GFP_USER);
> +		if (folio && folio_try_get(folio)) {
> +			page = &folio->page;
> +			err = hugetlb_add_to_page_cache(folio,
> +							file->f_mapping,
> +							idx);
> +			if (err) {
> +				folio_put(folio);
> +				free_huge_folio(folio);
> +				page = ERR_PTR(err);
> +			}
> +		}
> +		return page;

You could avoid the "page" variable completely simply by using 3 return 
statements.

LGTM, thanks

Reviewed-by: David Hildenbrand <david at redhat.com>

-- 
Cheers,

David / dhildenb



More information about the dri-devel mailing list