[PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
Jason Gunthorpe
jgg at nvidia.com
Wed Sep 7 16:12:52 UTC 2022
On Wed, Sep 07, 2022 at 08:32:23AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 07, 2022 at 12:23:28PM -0300, Jason Gunthorpe wrote:
> > 2) DMABUF abuses dma_map_resource() for P2P and thus doesn't work in
> > certain special cases.
>
> Not just certain special cases, but one of the main use cases.
> Basically P2P can happen in two ways:
>
> a) through a PCIe switch, or
> b) through connected root ports
Yes, we tested both, both work.
> The open code version here only supports a), only supports it when there
> is no offset between the 'phyiscal' address of the BAR seen PCIe
> endpoint and the Linux way. x86 usually (always?) doesn't have an
> offset there, but other architectures often do.
The PCI offset is some embedded thing - I've never seen it in a server
platform.
Frankly, it is just bad SOC design and there is good reason why
non-zero needs to be avoided. As soon as you create aliases between
the address spaces you invite trouble. IIRC a SOC I used once put the
memory at 0 -> 4G then put the only PCI aperture at 4g ->
4g+N. However this design requires 64 bit PCI support, which at the
time, the platform didn't have. So they used PCI offset to hackily
alias the aperture over the DDR. I don't remember if they threw out a
bit of DDR to resolve the alias, or if they just didn't support PCI
switches.
In any case, it is a complete mess. You either drastically limit your
BAR size, don't support PCI switches or loose a lot of DDR.
I also seem to remember that iommu and PCI offset don't play nice
together - so for the VFIO use case where the iommu is present I'm
pretty sure we can very safely assume 0 offset. That seems confirmed
by the fact that VFIO has never handled PCI offset in its own P2P path
and P2P works fine in VMs across a wide range of platforms.
That said, I agree we should really have APIs that support this
properly, and dma_map_resource is certainly technically wrong.
So, would you be OK with this series if I try to make a dma_map_p2p()
that resolves the offset issue?
> Last but not least I don't really see how the code would even work
> when an IOMMU is used, as dma_map_resource will return an IOVA that
> is only understood by the IOMMU itself, and not the other endpoint.
I don't understand this.
__iommu_dma_map() will put the given phys into the iommu_domain
associated with 'dev' and return the IOVA it picked.
Here 'dev' is the importing device, it is the device that will issue
the DMA:
+ dma_addr = dma_map_resource(
+ attachment->dev,
+ pci_resource_start(priv->vdev->pdev, priv->index) +
+ priv->offset,
+ priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
eg attachment->dev is the PCI device of the RDMA device, not the VFIO
device.
'phys' is the CPU physical of the PCI BAR page, which with 0 PCI
offset is the right thing to program into the IO page table.
> How was this code even tested?
It was tested on a few platforms, like I said above, the cases where
it doesn't work are special, largely embedded, and not anything we
have in our labs - AFAIK.
Jason
More information about the dri-devel
mailing list