<!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Am 15.01.25 um 16:10 schrieb Jason Gunthorpe: <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <pre class="moz-quote-pre" wrap="">On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote: </pre> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">Those rules are not something we cam up with because of some limitation of the DMA-API, but rather from experience working with different device driver and especially their developers. </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> I would say it stems from the use of scatter list. You do not have enough information exchanged between exporter and importer to implement something sane and correct. At that point being restrictive is a reasonable path. Because of scatterlist developers don't have APIs that correctly solve the problems they want to solve, so of course things get into a mess.</pre> </blockquote> Well I completely agree that scatterlists have many many problems. And at least some of the stuff you note here sounds like a good idea to tackle those problems. But I'm trying to explain the restrictions and requirements we previously found necessary. And I strongly think that any new approach needs to respect those restrictions as well or otherwise we will just repeat history. <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">Applying and enforcing those restrictions is absolutely mandatory must have for extending DMA-buf. </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> You said to come to the maintainers with the problems, here are the problems. Your answer is don't use dmabuf. That doesn't make the problems go away :(</pre> </blockquote> Yeah, that's why I'm desperately trying to understand your use case. <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> The point is that a dmabuf2 would most likely be rejected as well or otherwise run into the same issues we have seen before. </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> You'd need to be much more concrete and technical in your objections to cause a rejection. "We tried something else before and it didn't work" won't cut it.</pre> </blockquote> Granted, let me try to improve this. Here is a real world example of one of the issues we ran into and why CPU mappings of importers are redirected to the exporter. We have a good bunch of different exporters who track the CPU mappings of their backing store using address_space objects in one way or another and then uses unmap_mapping_range() to invalidate those CPU mappings. But when importers get the PFNs of the backing store they can look behind the curtain and directly insert this PFN into the CPU page tables. We had literally tons of cases like this where drivers developers cause access after free issues because the importer created a CPU mappings on their own without the exporter knowing about it. This is just one example of what we ran into. Additional to that basically the whole synchronization between drivers was overhauled as well because we found that we can't trust importers to always do the right thing. <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <pre class="moz-quote-pre" wrap="">There is a very simple problem statement here, we need a FD handle for various kinds of memory, with a lifetime model that fits a couple of different use cases. The exporter and importer need to understand what type of memory it is and what rules apply to working with it. The required importers are more general that just simple PCI DMA. I feel like this is already exactly DMABUF's mission. Besides, you have been saying to go do this in TEE or whatever, how is that any different from dmabuf2?</pre> </blockquote> You can already turn both a TEE allocated buffer as well as a memfd into a DMA-buf. So basically TEE and memfd already provides different interfaces which go beyond what DMA-buf does and allows. In other words if you want to do things like direct I/O to block or network devices you can mmap() your memfd and do this while at the same time send your memfd as DMA-buf to your GPU, V4L or neural accelerator. Would this be a way you could work with as well? E.g. you have your separate file descriptor representing the private MMIO which iommufd and KVM uses but you can turn it into a DMA-buf whenever you need to give it to a DMA-buf importer? <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with. </pre> </blockquote> <pre class="moz-quote-pre" wrap="">Has nothing to do with TEE. </pre> </blockquote> <pre class="moz-quote-pre" wrap="">Why? </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap="">The Linux TEE framework is not used as part of confidential compute. CC already has guest memfd for holding it's private CPU memory. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Where is that coming from and how it is used? </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> What do you mean? guest memfd is the result of years of negotiation in the mm and x86 arch subsystems :( It is used like a normal memfd, and we now have APIs in KVM and iommufd to directly intake and map from a memfd. I expect guestmemfd will soon grow some more generic dmabuf-like lifetime callbacks to avoid pinning - it already has some KVM specific APIs IIRC. But it is 100% exclusively focused on CPU memory and nothing else.</pre> </blockquote> I have seen patches for that flying by on mailing lists and have a high level understand of what's supposed to do, but never really looked more deeply into the code. <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">This is about confidential MMIO memory. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Who is the exporter and who is the importer of the DMA-buf in this use case? </pre> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.</pre> </blockquote> So basically a portion of a PCIe BAR is imported into iommufd? <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Yes and it is documented by now how that is supposed to happen with DMA-buf. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> I doubt that. It is complex and not fully solved in the core code today. Many scenarios do not work correctly, devices don't even exist yet that can exercise the hard paths. This is a future problem :(</pre> </blockquote> Let's just say that both the ARM guys as well as the GPU people already have some pretty "interesting" ways of doing digital rights management and content protection. Regards, Christian. <blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com"> <pre class="moz-quote-pre" wrap=""> Jason </pre> </blockquote> </body> </html>