[RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

Wed Jan 15 16:34:23 UTC 2025

Am 15.01.25 um 16:10 schrieb Jason Gunthorpe:
> On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:
>
>>> Those rules are not something we cam up with because of some limitation
>>> of the DMA-API, but rather from experience working with different device
>>> driver and especially their developers.
> I would say it stems from the use of scatter list. You do not have
> enough information exchanged between exporter and importer to
> implement something sane and correct. At that point being restrictive
> is a reasonable path.
>
> Because of scatterlist developers don't have APIs that correctly solve
> the problems they want to solve, so of course things get into a mess.

Well I completely agree that scatterlists have many many problems. And 
at least some of the stuff you note here sounds like a good idea to 
tackle those problems.

But I'm trying to explain the restrictions and requirements we 
previously found necessary. And I strongly think that any new approach 
needs to respect those restrictions as well or otherwise we will just 
repeat history.

>>> Applying and enforcing those restrictions is absolutely mandatory must
>>> have for extending DMA-buf.
> You said to come to the maintainers with the problems, here are the
> problems. Your answer is don't use dmabuf.
>
> That doesn't make the problems go away :(

Yeah, that's why I'm desperately trying to understand your use case.

>>>> I really don't want to make a dmabuf2 - everyone would have to
>>>> implement it, including all the GPU drivers if they want to work with
>>>> RDMA. I don't think this makes any sense compared to incrementally
>>>> evolving dmabuf with more optional capabilities.
>>> The point is that a dmabuf2 would most likely be rejected as well or
>>> otherwise run into the same issues we have seen before.
> You'd need to be much more concrete and technical in your objections
> to cause a rejection. "We tried something else before and it didn't
> work" won't cut it.

Granted, let me try to improve this.

Here is a real world example of one of the issues we ran into and why 
CPU mappings of importers are redirected to the exporter.

We have a good bunch of different exporters who track the CPU mappings 
of their backing store using address_space objects in one way or another 
and then uses unmap_mapping_range() to invalidate those CPU mappings.

But when importers get the PFNs of the backing store they can look 
behind the curtain and directly insert this PFN into the CPU page tables.

We had literally tons of cases like this where drivers developers cause 
access after free issues because the importer created a CPU mappings on 
their own without the exporter knowing about it.

This is just one example of what we ran into. Additional to that 
basically the whole synchronization between drivers was overhauled as 
well because we found that we can't trust importers to always do the 
right thing.

> There is a very simple problem statement here, we need a FD handle for
> various kinds of memory, with a lifetime model that fits a couple of
> different use cases. The exporter and importer need to understand what
> type of memory it is and what rules apply to working with it. The
> required importers are more general that just simple PCI DMA.
>
> I feel like this is already exactly DMABUF's mission.
>
> Besides, you have been saying to go do this in TEE or whatever, how is
> that any different from dmabuf2?

You can already turn both a TEE allocated buffer as well as a memfd into 
a DMA-buf. So basically TEE and memfd already provides different 
interfaces which go beyond what DMA-buf does and allows.

In other words if you want to do things like direct I/O to block or 
network devices you can mmap() your memfd and do this while at the same 
time send your memfd as DMA-buf to your GPU, V4L or neural accelerator.

Would this be a way you could work with as well? E.g. you have your 
separate file descriptor representing the private MMIO which iommufd and 
KVM uses but you can turn it into a DMA-buf whenever you need to give it 
to a DMA-buf importer?

>>>>>>>> That sounds more something for the TEE driver instead of anything DMA-buf
>>>>>>>> should be dealing with.
>>>>>>> Has nothing to do with TEE.
>>>>>> Why?
>>>> The Linux TEE framework is not used as part of confidential compute.
>>>>
>>>> CC already has guest memfd for holding it's private CPU memory.
>>> Where is that coming from and how it is used?
> What do you mean? guest memfd is the result of years of negotiation in
> the mm and x86 arch subsystems :( It is used like a normal memfd, and
> we now have APIs in KVM and iommufd to directly intake and map from a
> memfd. I expect guestmemfd will soon grow some more generic
> dmabuf-like lifetime callbacks to avoid pinning - it already has some
> KVM specific APIs IIRC.
>
> But it is 100% exclusively focused on CPU memory and nothing else.

I have seen patches for that flying by on mailing lists and have a high 
level understand of what's supposed to do, but never really looked more 
deeply into the code.

>>>> This is about confidential MMIO memory.
>>> Who is the exporter and who is the importer of the DMA-buf in this use
>>> case?
> In this case Xu is exporting MMIO from VFIO and importing to KVM and
> iommufd.

So basically a portion of a PCIe BAR is imported into iommufd?

>>> This is also not just about the KVM side, the VM side also has issues
>>> with DMABUF and CC - only co-operating devices can interact with the
>>> VM side "encrypted" memory and there needs to be a negotiation as part
>>> of all buffer setup what the mutual capability is. :\ swiotlb hides
>>> some of this some times, but confidential P2P is currently unsolved.
>> Yes and it is documented by now how that is supposed to happen with
>> DMA-buf.
> I doubt that. It is complex and not fully solved in the core code
> today. Many scenarios do not work correctly, devices don't even exist
> yet that can exercise the hard paths. This is a future problem :(

Let's just say that both the ARM guys as well as the GPU people already 
have some pretty "interesting" ways of doing digital rights management 
and content protection.

Regards,
Christian.

>
> Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250115/33071b32/attachment-0001.htm>