[PATCH v1 1/3] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages
Kasireddy, Vivek
vivek.kasireddy at intel.com
Tue Oct 17 07:39:41 UTC 2023
Hi David,
> > For drivers that would like to longterm-pin the pages associated
> > with a file, the pin_user_pages_fd() API provides an option to
> > not only FOLL_PIN the pages but also to check and migrate them
> > if they reside in movable zone or CMA block. For now, this API
> > can only work with files belonging to shmem or hugetlbfs given
> > that the udmabuf driver is the only user.
>
> Maybe add "Other files are rejected.". Wasn't clear to me before I
> looked into the code.
Ok, will add it in v2.
>
> >
> > It must be noted that the pages associated with hugetlbfs files
> > are expected to be found in the page cache. An error is returned
> > if they are not found. However, shmem pages can be swapped in or
> > allocated if they are not present in the page cache.
> >
> > Cc: David Hildenbrand <david at redhat.com>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Cc: Mike Kravetz <mike.kravetz at oracle.com>
> > Cc: Hugh Dickins <hughd at google.com>
> > Cc: Peter Xu <peterx at redhat.com>
> > Cc: Gerd Hoffmann <kraxel at redhat.com>
> > Cc: Dongwon Kim <dongwon.kim at intel.com>
> > Cc: Junxiao Chang <junxiao.chang at intel.com>
> > Suggested-by: Jason Gunthorpe <jgg at nvidia.com>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> > ---
> > include/linux/mm.h | 2 ++
> > mm/gup.c | 87
> ++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 89 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index bf5d0b1b16f4..af2121fb8101 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> > struct page **pages, unsigned int gup_flags);
> > long pin_user_pages_unlocked(unsigned long start, unsigned long
> nr_pages,
> > struct page **pages, unsigned int gup_flags);
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > + unsigned int gup_flags, struct page **pages);
> >
> > int get_user_pages_fast(unsigned long start, int nr_pages,
> > unsigned int gup_flags, struct page **pages);
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 2f8a2d89fde1..e34b77a15fa8 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3400,3 +3400,90 @@ long pin_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> > &locked, gup_flags);
> > }
> > EXPORT_SYMBOL(pin_user_pages_unlocked);
> > +
>
> This does look quite neat, nice! Let's take a closer look ...
>
> > +/**
> > + * pin_user_pages_fd() - pin user pages associated with a file
> > + * @fd: the fd whose pages are to be pinned
> > + * @start: starting file offset
> > + * @nr_pages: number of pages from start to pin
> > + * @gup_flags: flags modifying pin behaviour
>
> ^ I assume we should drop that. At least for now the flags are
> completely unused. And most likely we would want a different set of
> flags later (GUPFD_ ...).
Right now, FOLL_LONGTERM is the only accepted value for gup_flags but
yes, as you suggest, this can be made implicit by dropping gup_flags.
>
> > + * @pages: array that receives pointers to the pages pinned.
> > + * Should be at least nr_pages long.
> > + *
> > + * Attempt to pin (and migrate) pages associated with a file belonging to
>
> I'd drop the "and migrate" part, it's more of an implementation detail.
>
> > + * either shmem or hugetlbfs. An error is returned if pages associated with
> > + * hugetlbfs files are not present in the page cache. However, shmem
> pages
> > + * are swapped in or allocated if they are not present in the page cache.
>
> Why don't we do the same for hugetlbfs? Would make the interface more
> streamlined.
I am going off of what Mike has stated previously:
"It may not matter to your users, but the semantics for hugetlb and shmem
pages is different. hugetlb requires the pages exist in the page cache
while shmem will create/add pages to the cache if necessary."
However, if we were to allocate a hugepage (assuming one is not present in the
page cache at a given index), what needs to be done in addition to calling these APIs?
folio = alloc_hugetlb_folio_nodemask(h, NUMA_NO_NODE, NULL, GFP_USER)
hugetlb_add_to_page_cache(folio, mapping, idx)
>
> Certainly add that pinned pages have to be released using
> unpin_user_pages().
Sure, will include that in v2.
>
> > + *
> > + * Returns number of pages pinned. This would be equal to the number of
> > + * pages requested.
> > + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns
> > + * -errno.
> > + */
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > + unsigned int gup_flags, struct page **pages)
> > +{
> > + struct page *page;
> > + struct file *filep;
> > + unsigned int flags, i;
> > + long ret;
> > +
> > + if (nr_pages <= 0)
> > + return 0;
>
> I think we should just forbid that and use a WARN_ON_ONCE() here /
> return -EINVAL. So we'll never end up returning 0.
I think I'll drop this check in v2 as Jason suggested.
>
> > + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> > + return 0;
> > +
> > + if (start < 0)
> > + return -EINVAL;
> > +
> > + filep = fget(fd);
> > + if (!filep)
> > + return -EINVAL;
> > +
> > + if (!shmem_file(filep) && !is_file_hugepages(filep))
> > + return -EINVAL;
> > +
> > + flags = memalloc_pin_save();
> > + do {
> > + for (i = 0; i < nr_pages; i++) {
> > + if (shmem_mapping(filep->f_mapping)) {
> > + page = shmem_read_mapping_page(filep-
> >f_mapping,
> > + start + i);
> > + if (IS_ERR(page)) {
> > + ret = PTR_ERR(page);
> > + goto err;
> > + }
> > + } else {
> > + page = find_get_page_flags(filep->f_mapping,
> > + start + i,
> > + FGP_ACCESSED);
> > + if (!page) {
> > + ret = -EINVAL;
> > + goto err;
> > + }
> > + }
> > + ret = try_grab_page(page, FOLL_PIN);
> > + if (unlikely(ret))
> > + goto err;
> > +
> > + pages[i] = page;
> > + put_page(pages[i]);
> > + }
> > +
> > + ret = check_and_migrate_movable_pages(nr_pages, pages);
> > + } while (ret == -EAGAIN);
> > +
> > +err:
> > + memalloc_pin_restore(flags);
> > + fput(filep);
> > + if (!ret)
> > + return nr_pages;
> > +
> > + while (i > 0 && pages[--i]) {
> > + unpin_user_page(pages[i]);
> > + pages[i] = NULL;
>
> If migrate_longterm_unpinnable_pages() failed, say with -ENOMEM, the
> pages were already unpinned, but pages[i] has not been cleared, no?
You are right; the above while should not be executed in that case. I added
this chunk to cleanup after any errors thrown in the for loop above. I guess
I need to add a new error label to cleanup after errors thrown by
check_and_migrate_movable_pages().
Thanks,
Vivek
>
> > + }
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(pin_user_pages_fd);
> > +
>
> --
> Cheers,
>
> David / dhildenb
More information about the dri-devel
mailing list