[PATCH v1 2/2] vfio/pci: Allow MMIO regions to be exported through dma-buf

Jason Gunthorpe jgg at nvidia.com
Wed May 8 00:31:53 UTC 2024


On Thu, May 02, 2024 at 07:50:36AM +0000, Kasireddy, Vivek wrote:
> Hi Jason,
> 
> > 
> > On Tue, Apr 30, 2024 at 04:24:50PM -0600, Alex Williamson wrote:
> > > > +static vm_fault_t vfio_pci_dma_buf_fault(struct vm_fault *vmf)
> > > > +{
> > > > +	struct vm_area_struct *vma = vmf->vma;
> > > > +	struct vfio_pci_dma_buf *priv = vma->vm_private_data;
> > > > +	pgoff_t pgoff = vmf->pgoff;
> > > > +
> > > > +	if (pgoff >= priv->nr_pages)
> > > > +		return VM_FAULT_SIGBUS;
> > > > +
> > > > +	return vmf_insert_pfn(vma, vmf->address,
> > > > +			      page_to_pfn(priv->pages[pgoff]));
> > > > +}
> > >
> > > How does this prevent the MMIO space from being mmap'd when disabled
> > at
> > > the device?  How is the mmap revoked when the MMIO becomes disabled?
> > > Is it part of the move protocol?
> In this case, I think the importers that mmap'd the dmabuf need to be tracked
> separately and their VMA PTEs need to be zapped when MMIO access is revoked.

Which, as we know, is quite hard.

> > Yes, we should not have a mmap handler for dmabuf. vfio memory must be
> > mmapped in the normal way.
> Although optional, I think most dmabuf exporters (drm ones) provide a mmap
> handler. Otherwise, there is no easy way to provide CPU access (backup slow path)
> to the dmabuf for the importer.

Here we should not, there is no reason since VFIO already provides a
mmap mechanism itself. Anything using this API should just call the
native VFIO function instead of trying to mmap the DMABUF. Yes, it
will be inconvient for the scatterlist case you have, but the kernel
side implementation is much easier ..


> > > > +/**
> > > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> > > > + * region selected.
> > > > + *
> > > > + * open_flags are the typical flags passed to open(2), eg O_RDWR,
> > O_CLOEXEC,
> > > > + * etc. offset/length specify a slice of the region to create the dmabuf
> > from.
> > > > + * If both are 0 then the whole region is used.
> > > > + *
> > > > + * Return: The fd number on success, -1 and errno is set on failure.
> > > > + */
> > > > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11
> > > > +
> > > > +struct vfio_region_p2p_area {
> > > > +	__u32	region_index;
> > > > +	__u32	__pad;
> > > > +	__u64	offset;
> > > > +	__u64	length;
> > > > +};
> > > > +
> > > > +struct vfio_device_feature_dma_buf {
> > > > +	__u32	open_flags;
> > > > +	__u32	nr_areas;
> > > > +	struct vfio_region_p2p_area p2p_areas[];
> > > > +};
> > 
> > Still have no clue what this p2p areas is. You want to create a dmabuf
> > out of a scatterlist? Why??

> Because the data associated with a buffer that needs to be shared can
> come from multiple ranges. I probably should have used the terms ranges
> or slices or chunks to make it more clear instead of p2p areas.

Yes, please pick a better name.

> > I'm also not sure of the use of the pci_p2pdma family of functions, it
> > is a bold step to make struct pages, that isn't going to work in quite

> I guess things may have changed since the last discussion on this topic or
> maybe I misunderstood but I thought Christoph's suggestion was to use
> struct pages to populate the scatterlist instead of using DMA addresses
> and, I figured pci_p2pdma APIs can easily help with that.

It was, but it doesn't really work for VFIO. You can only create
struct pages for large MMIO spaces, and only on some architectures.

Requiring them will make VFIO unusable in alot of places. Requiring
them just for DMABUF will make that feature unusable.

Creating them on first use, and then ignoring how broken it is 
perhaps reasonable for now, but I'm unhappy to see it so poorly
usable.
> 
> > alot of cases. It is really hacky/wrong to immediately call
> > pci_alloc_p2pmem() to defeat the internal genalloc.

> In my use-case, I need to use all the pages from the pool and I don't see any
> better way to do it.

You have to fix the P2P API to work properly for this kind of use
case.

There should be no genalloc at all.

> > I'd rather we stick with the original design. Leon is working on DMA
> > API changes that should address half the issue.
>
> Ok, I'll keep an eye out for Leon's work.

It saves from messing with the P2P stuff, but you get the other issues
back.

Jason


More information about the dri-devel mailing list