[PATCH v2 1/3] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v2)

David Hildenbrand david at redhat.com
Mon Nov 13 10:33:43 UTC 2023


On 06.11.23 07:15, Vivek Kasireddy wrote:
> For drivers that would like to longterm-pin the pages associated
> with a file, the pin_user_pages_fd() API provides an option to
> not only pin the pages via FOLL_PIN but also to check and migrate
> them if they reside in movable zone or CMA block. This API
> currently works with files that belong to either shmem or hugetlbfs.
> Files belonging to other filesystems are rejected for now.
> 
> The pages need to be located first before pinning them via FOLL_PIN.
> If they are found in the page cache, they can be immediately pinned.
> Otherwise, they need to be allocated using the filesystem specific
> APIs and then pinned.
> 
> v2:
> - Drop gup_flags and improve comments and commit message (David)
> - Allocate a page if we cannot find in page cache for the hugetlbfs
>    case as well (David)
> - Don't unpin pages if there is a migration related failure (David)
> - Drop the unnecessary nr_pages <= 0 check (Jason)
> - Have the caller of the API pass in file * instead of fd (Jason)
> 
> Cc: David Hildenbrand <david at redhat.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Mike Kravetz <mike.kravetz at oracle.com>
> Cc: Hugh Dickins <hughd at google.com>
> Cc: Peter Xu <peterx at redhat.com>
> Cc: Gerd Hoffmann <kraxel at redhat.com>
> Cc: Dongwon Kim <dongwon.kim at intel.com>
> Cc: Junxiao Chang <junxiao.chang at intel.com>
> Suggested-by: Jason Gunthorpe <jgg at nvidia.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> ---
>   include/linux/mm.h |  2 +
>   mm/gup.c           | 99 ++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 101 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bf5d0b1b16f4..f6cc17b14653 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
>   long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
> +long pin_user_pages_fd(struct file *file, pgoff_t start,
> +		       unsigned long nr_pages, struct page **pages);
>   
>   int get_user_pages_fast(unsigned long start, int nr_pages,
>   			unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index 2f8a2d89fde1..d30b9dfebbb6 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3400,3 +3400,102 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   				     &locked, gup_flags);
>   }
>   EXPORT_SYMBOL(pin_user_pages_unlocked);
> +
> +static struct page *alloc_file_page(struct file *file, pgoff_t idx)
> +{
> +	struct page *page = ERR_PTR(-ENOMEM);
> +	struct folio *folio;
> +	int err;
> +
> +	if (shmem_file(file))
> +		return shmem_read_mapping_page(file->f_mapping, idx);
> +

As the build reports indicate, this might have to be fenced with

#ifdef CONFIG_HUGETLB_PAGE

> +	folio = alloc_hugetlb_folio_nodemask(hstate_file(file),
> +					     NUMA_NO_NODE,
> +					     NULL,
> +					     GFP_USER);
> +	if (folio && folio_try_get(folio)) {
> +		page = &folio->page;
> +		err = hugetlb_add_to_page_cache(folio, file->f_mapping, idx);
> +		if (err) {
> +			folio_put(folio);
> +			free_huge_folio(folio);
> +			page = ERR_PTR(err);
> +		}
> +	}
> +
> +	return page;
> +}
> +
> +/**
> + * pin_user_pages_fd() - pin user pages associated with a file
> + * @file:       the file whose pages are to be pinned
> + * @start:      starting file offset
> + * @nr_pages:   number of pages from start to pin
> + * @pages:      array that receives pointers to the pages pinned.
> + *              Should be at-least nr_pages long.
> + *
> + * Attempt to pin pages associated with a file that belongs to either shmem
> + * or hugetlbfs. The pages are either found in the page cache or allocated

nit: s/hugetlbfs/hugetlb/

> + * if necessary. Once the pages are located, they are all pinned via FOLL_PIN.
> + * And, these pinned pages need to be released using unpin_user_pages() or
> + * unpin_user_page().
> + *

It might be reasonable to add that the behavior mimics FOLL_LONGTERM 
semantics: the page may be held for an indefinite time period _often_ 
under userspace control.

> + * Returns number of pages pinned. This would be equal to the number of
> + * pages requested. If no pages were pinned, it returns -errno.
> + */
> +long pin_user_pages_fd(struct file *file, pgoff_t start,
> +		       unsigned long nr_pages, struct page **pages)
> +{
> +	struct page *page;
> +	unsigned int flags, i;
> +	long ret;
> +
> +	if (start < 0)
> +		return -EINVAL;
> +
> +	if (!file)
> +	    return -EINVAL;
> +
> +	if (!shmem_file(file) && !is_file_hugepages(file))
> +	    return -EINVAL;
> +
> +	flags = memalloc_pin_save();
> +	do {
> +		for (i = 0; i < nr_pages; i++) {
> +			/*
> + 			 * In most cases, we should be able to find the page
> +			 * in the page cache. If we cannot find it, we try to
> +			 * allocate one and add it to the page cache.
> +			 */
> +			page = find_get_page_flags(file->f_mapping,
> +						   start + i,
> +						   FGP_ACCESSED);
> +			if (!page) {
> +				page = alloc_file_page(file, start + i);
> +				if (IS_ERR(page)) {
> +					ret = PTR_ERR(page);
> +					goto err;
> +				}
> +			}
> +			ret = try_grab_page(page, FOLL_PIN);
> +			if (unlikely(ret))
> +				goto err;
> +
> +			pages[i] = page;
> +			put_page(pages[i]);
> +		}
> +
> +		ret = check_and_migrate_movable_pages(nr_pages, pages);
> +	} while (ret == -EAGAIN);
> +
> +	memalloc_pin_restore(flags);
> +	return ret ? ret : nr_pages;
> +err:

missing memalloc_pin_restore() ?

> +	while (i > 0 && pages[--i])
> +		unpin_user_page(pages[i]);

So if any pages[] would be 0, we would stop completely? Shouldn't this 
be something like:

while (i-- > 0)
	if (pages[i])
		unpin_user_page(pages[i]);

-- 
Cheers,

David / dhildenb



More information about the dri-devel mailing list