[PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages

Kasireddy, Vivek vivek.kasireddy at intel.com
Fri Jun 23 06:13:02 UTC 2023


Hi David,

> > The first patch ensures that the mappings needed for handling mmap
> > operation would be managed by using the pfn instead of struct page.
> > The second patch restores support for mapping hugetlb pages where
> > subpages of a hugepage are not directly used anymore (main reason
> > for revert) and instead the hugetlb pages and the relevant offsets
> > are used to populate the scatterlist for dma-buf export and for
> > mmap operation.
> >
> > Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500
> options
> > were passed to the Host kernel and Qemu was launched with these
> > relevant options: qemu-system-x86_64 -m 4096m....
> > -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080
> > -display gtk,gl=on
> > -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M
> > -machine memory-backend=mem1
> >
> > Replacing -display gtk,gl=on with -display gtk,gl=off above would
> > exercise the mmap handler.
> >
> 
> While I think the VM_PFNMAP approach is much better and should fix that
> issue at hand, I thought more about missing memlock support and realized
> that we might have to fix something else. SO I'm going to raise the
> issue here.
> 
> I think udmabuf chose the wrong interface to do what it's doing, that
> makes it harder to fix it eventually.
> 
> Instead of accepting a range in a memfd, it should just have accepted a
> user space address range and then used
> pin_user_pages(FOLL_WRITE|FOLL_LONGTERM) to longterm-pin the pages
> "officially".
Udmabuf indeed started off by using user space address range and GUP but
the dma-buf subsystem maintainer had concerns with that approach in v2.
It also had support for mlock in that version. Here is v2 and the relevant
conversation:
https://patchwork.freedesktop.org/patch/210992/?series=39879&rev=2

> 
> So what's the issue? Udma effectively pins pages longterm ("possibly
> forever") simply by grabbing a reference on them. These pages might
> easily reside in ZONE_MOVABLE or in MIGRATE_CMA pageblocks.
> 
> So what udmabuf does is break memory hotunplug and CMA, because it
> turns
> pages that have to remain movable unmovable.
> 
> In the pin_user_pages(FOLL_LONGTERM) case we make sure to migrate
> these
> pages. See mm/gup.c:check_and_migrate_movable_pages() and especially
> folio_is_longterm_pinnable(). We'd probably have to implement something
> similar for udmabuf, where we detect such unpinnable pages and migrate
> them.
The pages udmabuf pins are only those associated with Guest (GPU driver/virtio-gpu)
resources (or buffers allocated and pinned from shmem via drm GEM). Some
resources are short-lived, and some are long-lived and whenever a resource
gets destroyed, the pages are unpinned. And, not all resources have their pages
pinned. The resource that is pinned for the longest duration is the FB and that's
because it is updated every ~16ms (assuming 1920x1080 at 60) by the Guest
GPU driver. We can certainly pin/unpin the FB after it is accessed on the Host
as a workaround, but I guess that may not be very efficient given the amount
of churn it would create.

Also, as far as migration or S3/S4 is concerned, my understanding is that all
the Guest resources are destroyed and recreated again. So, wouldn't something
similar happen during memory hotunplug?

> 
> 
> For example, pairing udmabuf with vfio (which pins pages using
> pin_user_pages(FOLL_LONGTERM)) in QEMU will most probably not work in
> all cases: if udmabuf longterm pinned the pages "the wrong way", vfio
> will fail to migrate them during FOLL_LONGTERM and consequently fail
> pin_user_pages(). As long as udmabuf holds a reference on these pages,
> that will never succeed.
Dma-buf rules (for exporters) indicate that the pages only need to be pinned
during the map_attachment phase (and until unmap attachment happens).
In other words, only when the sg_table is created by udmabuf. I guess one
option would be to not hold any references during UDMABUF_CREATE and
only grab references to the pages (as and when it gets used) during this step.
Would this help?

> 
> 
> There are *probably* more issues on the QEMU side when udmabuf is
> paired
> with things like MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE used for
> virtio-balloon, virtio-mem, postcopy live migration, ... for example, in
> the vfio/vdpa case we make sure that we disallow most of these, because
> otherwise there can be an accidental "disconnect" between the pages
> mapped into the VM (guest view) and the pages mapped into the IOMMU
> (device view), for example, after a reboot.
Ok; I am not sure if I can figure out if there is any acceptable way to address
these issues but given the current constraints associated with udmabuf, what
do you suggest is the most reasonable way to deal with these problems you
have identified?

Thanks,
Vivek

> 
> --
> Cheers,
> 
> David / dhildenb



More information about the dri-devel mailing list