<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 15.01.25 um 16:10 schrieb Jason Gunthorpe:<br>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<pre class="moz-quote-pre" wrap="">On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Those rules are not something we cam up with because of some limitation
of the DMA-API, but rather from experience working with different device
driver and especially their developers.
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I would say it stems from the use of scatter list. You do not have
enough information exchanged between exporter and importer to
implement something sane and correct. At that point being restrictive
is a reasonable path.
Because of scatterlist developers don't have APIs that correctly solve
the problems they want to solve, so of course things get into a mess.</pre>
</blockquote>
<br>
Well I completely agree that scatterlists have many many problems.
And at least some of the stuff you note here sounds like a good idea
to tackle those problems.<br>
<br>
But I'm trying to explain the restrictions and requirements we
previously found necessary. And I strongly think that any new
approach needs to respect those restrictions as well or otherwise we
will just repeat history.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Applying and enforcing those restrictions is absolutely mandatory must
have for extending DMA-buf.
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
You said to come to the maintainers with the problems, here are the
problems. Your answer is don't use dmabuf.
That doesn't make the problems go away :(</pre>
</blockquote>
<br>
Yeah, that's why I'm desperately trying to understand your use case.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I really don't want to make a dmabuf2 - everyone would have to
implement it, including all the GPU drivers if they want to work with
RDMA. I don't think this makes any sense compared to incrementally
evolving dmabuf with more optional capabilities.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
The point is that a dmabuf2 would most likely be rejected as well or
otherwise run into the same issues we have seen before.
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
You'd need to be much more concrete and technical in your objections
to cause a rejection. "We tried something else before and it didn't
work" won't cut it.</pre>
</blockquote>
<br>
Granted, let me try to improve this.<br>
<br>
Here is a real world example of one of the issues we ran into and
why CPU mappings of importers are redirected to the exporter.<br>
<br>
We have a good bunch of different exporters who track the CPU
mappings of their backing store using address_space objects in one
way or another and then uses unmap_mapping_range() to invalidate
those CPU mappings.<br>
<br>
But when importers get the PFNs of the backing store they can look
behind the curtain and directly insert this PFN into the CPU page
tables.<br>
<br>
We had literally tons of cases like this where drivers developers
cause access after free issues because the importer created a CPU
mappings on their own without the exporter knowing about it.<br>
<br>
This is just one example of what we ran into. Additional to that
basically the whole synchronization between drivers was overhauled
as well because we found that we can't trust importers to always do
the right thing.<br>
<br>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<pre class="moz-quote-pre" wrap="">There is a very simple problem statement here, we need a FD handle for
various kinds of memory, with a lifetime model that fits a couple of
different use cases. The exporter and importer need to understand what
type of memory it is and what rules apply to working with it. The
required importers are more general that just simple PCI DMA.
I feel like this is already exactly DMABUF's mission.
Besides, you have been saying to go do this in TEE or whatever, how is
that any different from dmabuf2?</pre>
</blockquote>
<br>
You can already turn both a TEE allocated buffer as well as a memfd
into a DMA-buf. So basically TEE and memfd already provides
different interfaces which go beyond what DMA-buf does and allows.<br>
<br>
In other words if you want to do things like direct I/O to block or
network devices you can mmap() your memfd and do this while at the
same time send your memfd as DMA-buf to your GPU, V4L or neural
accelerator.<br>
<br>
Would this be a way you could work with as well? E.g. you have your
separate file descriptor representing the private MMIO which iommufd
and KVM uses but you can turn it into a DMA-buf whenever you need to
give it to a DMA-buf importer?<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">That sounds more something for the TEE driver instead of anything DMA-buf
should be dealing with.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Has nothing to do with TEE.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Why?
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Where is that coming from and how it is used?
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
What do you mean? guest memfd is the result of years of negotiation in
the mm and x86 arch subsystems :( It is used like a normal memfd, and
we now have APIs in KVM and iommufd to directly intake and map from a
memfd. I expect guestmemfd will soon grow some more generic
dmabuf-like lifetime callbacks to avoid pinning - it already has some
KVM specific APIs IIRC.
But it is 100% exclusively focused on CPU memory and nothing else.</pre>
</blockquote>
<br>
I have seen patches for that flying by on mailing lists and have a
high level understand of what's supposed to do, but never really
looked more deeply into the code.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">This is about confidential MMIO memory.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Who is the exporter and who is the importer of the DMA-buf in this use
case?
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
In this case Xu is exporting MMIO from VFIO and importing to KVM and
iommufd.</pre>
</blockquote>
<br>
So basically a portion of a PCIe BAR is imported into iommufd?<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">This is also not just about the KVM side, the VM side also has issues
with DMABUF and CC - only co-operating devices can interact with the
VM side "encrypted" memory and there needs to be a negotiation as part
of all buffer setup what the mutual capability is. :\ swiotlb hides
some of this some times, but confidential P2P is currently unsolved.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Yes and it is documented by now how that is supposed to happen with
DMA-buf.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I doubt that. It is complex and not fully solved in the core code
today. Many scenarios do not work correctly, devices don't even exist
yet that can exercise the hard paths. This is a future problem :(</pre>
</blockquote>
<br>
Let's just say that both the ARM guys as well as the GPU people
already have some pretty "interesting" ways of doing digital rights
management and content protection.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:20250115151056.GS5556@nvidia.com">
<pre class="moz-quote-pre" wrap="">
Jason
</pre>
</blockquote>
<br>
</body>
</html>