[PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions
Leon Romanovsky
leon at kernel.org
Thu Jul 24 05:44:43 UTC 2025
On Thu, Jul 24, 2025 at 05:13:49AM +0000, Kasireddy, Vivek wrote:
> Hi Leon,
>
> > Subject: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO
> > regions
> >
> > From: Leon Romanovsky <leonro at nvidia.com>
> >
> > Add support for exporting PCI device MMIO regions through dma-buf,
> > enabling safe sharing of non-struct page memory with controlled
> > lifetime management. This allows RDMA and other subsystems to import
> > dma-buf FDs and build them into memory regions for PCI P2P operations.
> >
> > The implementation provides a revocable attachment mechanism using
> > dma-buf move operations. MMIO regions are normally pinned as BARs
> > don't change physical addresses, but access is revoked when the VFIO
> > device is closed or a PCI reset is issued. This ensures kernel
> > self-defense against potentially hostile userspace.
> >
> > Signed-off-by: Jason Gunthorpe <jgg at nvidia.com>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> > Signed-off-by: Leon Romanovsky <leonro at nvidia.com>
> > ---
> > drivers/vfio/pci/Kconfig | 20 ++
> > drivers/vfio/pci/Makefile | 2 +
> > drivers/vfio/pci/vfio_pci_config.c | 22 +-
> > drivers/vfio/pci/vfio_pci_core.c | 25 ++-
> > drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++
> > drivers/vfio/pci/vfio_pci_priv.h | 23 +++
> > include/linux/dma-buf.h | 1 +
> > include/linux/vfio_pci_core.h | 3 +
> > include/uapi/linux/vfio.h | 19 ++
> > 9 files changed, 431 insertions(+), 5 deletions(-)
> > create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
<...>
> > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev,
> > + struct vfio_device_feature_dma_buf *dma_buf)
> > +{
> > + struct pci_dev *pdev = vdev->pdev;
> > + u32 bar = dma_buf->region_index;
> > + u64 offset = dma_buf->offset;
> > + u64 len = dma_buf->length;
> > + resource_size_t bar_size;
> > + u64 sum;
> > +
> > + /*
> > + * For PCI the region_index is the BAR number like everything else.
> > + */
> > + if (bar >= VFIO_PCI_ROM_REGION_INDEX)
> > + return -ENODEV;
<...>
> > +/**
> > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> > + * regions selected.
> > + *
> > + * open_flags are the typical flags passed to open(2), eg O_RDWR,
> > O_CLOEXEC,
> > + * etc. offset/length specify a slice of the region to create the dmabuf from.
> > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the
> > dmabuf.
> Any particular reason why you dropped the option (nr_ranges) of creating a
> single dmabuf from multiple ranges of an MMIO region?
I did it for two reasons. First, I wanted to simplify the code in order
to speed-up discussion over the patchset itself. Second, I failed to
find justification for need of multiple ranges, as the number of BARs
are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality
can be achieved by multiple calls to DMABUF import.
>
> Restricting the dmabuf to a single range (or having to create multiple dmabufs
> to represent multiple regions/ranges associated with a single scattered buffer)
> would be very limiting and may not work in all cases. For instance, in my use-case,
> I am trying to share a large (4k mode) framebuffer (FB) located in GPU's VRAM
> between two (p2p compatible) GPU devices. And, this would probably not work
> given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may not be
> feasible when there is memory pressure.
Can you please help me and point to the place in the code where this can fail?
I'm probably missing something basic as there are no large allocations
in the current patchset.
>
> Furthermore, since you are adding a new UAPI with this patch/feature, as you know,
> we cannot go back and tweak it (to add support for nr_ranges > 1) should there
> be a need in the future, but you can always use nr_ranges = 1 anytime. Therefore,
> I think it makes sense to be flexible in terms of the number of ranges to include
> while creating a dmabuf instead of restricting ourselves to one range.
I'm not a big fan of over-engineering. Let's first understand if this
case is needed.
Thanks
>
> Thanks,
> Vivek
>
> > + *
> > + * Return: The fd number on success, -1 and errno is set on failure.
> > + */
> > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11
> > +
> > +struct vfio_device_feature_dma_buf {
> > + __u32 region_index;
> > + __u32 open_flags;
> > + __u64 offset;
> > + __u64 length;
> > +};
> > +
> > /* -------- API for Type1 VFIO IOMMU -------- */
> >
> > /**
> > --
> > 2.50.1
>
More information about the dri-devel
mailing list