<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Am 08.01.25 um 20:22 schrieb Xu Yilun:<br>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<pre class="moz-quote-pre" wrap="">On Wed, Jan 08, 2025 at 07:44:54PM +0100, Simona Vetter wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Jan 08, 2025 at 12:22:27PM -0400, Jason Gunthorpe wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Jan 08, 2025 at 04:25:54PM +0100, Christian König wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I have imagined a staged approach were DMABUF gets a new API that
works with the new DMA API to do importer mapping with "P2P source
information" and a gradual conversion.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
To make it clear as maintainer of that subsystem I would reject such a step
with all I have.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
This is unexpected, so you want to just leave dmabuf broken? Do you
have any plan to fix it, to fix the misuse of the DMA API, and all
the problems I listed below? This is a big deal, it is causing real
problems today.
If it going to be like this I think we will stop trying to use dmabuf
and do something simpler for vfio/kvm/iommufd :(
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
As the gal who help edit the og dma-buf spec 13 years ago, I think adding
pfn isn't a terrible idea. By design, dma-buf is the "everything is
optional" interface. And in the beginning, even consistent locking was
optional, but we've managed to fix that by now :-/</pre>
</blockquote>
</blockquote>
<br>
Well you were also the person who mangled the struct page pointers
in the scatterlist because people were abusing this and getting a
bloody nose :)<br>
<br>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Where I do agree with Christian is that stuffing pfn support into the
dma_buf_attachment interfaces feels a bit much wrong.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
So it could a dmabuf interface like mmap/vmap()? I was also wondering
about that. But finally I start to use dma_buf_attachment interface
because of leveraging existing buffer pin and move_notify.</pre>
</blockquote>
<br>
Exactly that's the point, sharing pfn doesn't work with the pin and
move_notify interfaces because of the MMU notifier approach Sima
mentioned.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">We have already gone down that road and it didn't worked at all and
was a really big pain to pull people back from it.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Nobody has really seriously tried to improve the DMA API before, so I
don't think this is true at all.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Aside, I really hope this finally happens!</pre>
</blockquote>
</blockquote>
<br>
Sorry my fault. I was not talking about the DMA API, but rather that
people tried to look behind the curtain of DMA-buf backing stores.<br>
<br>
In other words all the fun we had with scatterlists and that people
try to modify the struct pages inside of them.<br>
<br>
Improving the DMA API is something I really really hope for as well.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">3) Importing devices need to know if they are working with PCI P2P
addresses during mapping because they need to do things like turn on
ATS on their DMA. As for multi-path we have the same hacks inside mlx5
today that assume DMABUFs are always P2P because we cannot determine
if things are P2P or not after being DMA mapped.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Why would you need ATS on PCI P2P and not for system memory accesses?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
ATS has a significant performance cost. It is mandatory for PCI P2P,
but ideally should be avoided for CPU memory.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p
stuff a bit I guess ...</pre>
</blockquote>
</blockquote>
<br>
Hui? Why should ATS be mandatory for PCI P2P?<br>
<br>
We have tons of production systems using PCI P2P without ATS. And
it's the first time I hear that.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">5) iommufd and kvm are both using CPU addresses without DMA. No
exporter mapping is possible
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
We have customers using both KVM and XEN with DMA-buf, so I can clearly
confirm that this isn't true.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Today they are mmaping the dma-buf into a VMA and then using KVM's
follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable
dma-buf must have a CPU PFN.
Here Xu implements basically the same path, except without the VMA
indirection, and it suddenly not OK? Illogical.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
So the big difference is that for follow_pfn() you need mmu_notifier since
the mmap might move around, whereas with pfn smashed into
dma_buf_attachment you need dma_resv_lock rules, and the move_notify
callback if you go dynamic.
So I guess my first question is, which locking rules do you want here for
pfn importers?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
follow_pfn() is unwanted for private MMIO, so dma_resv_lock.</pre>
</blockquote>
<br>
As Sima explained you either have follow_pfn() and mmu_notifier or
you have DMA addresses and dma_resv lock / dma_fence.<br>
<br>
Just giving out PFNs without some lifetime associated with them is
one of the major problems we faced before and really not something
you can do.<br>
<br>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
If mmu notifiers is fine, then I think the current approach of follow_pfn
should be ok. But if you instead dma_resv_lock rules (or the cpu mmap
somehow is an issue itself), then I think the clean design is create a new
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
cpu mmap() is an issue, this series is aimed to eliminate userspace
mapping for private MMIO resources.</pre>
</blockquote>
<br>
Why?<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">separate access mechanism just for that. It would be the 5th or so (kernel
vmap, userspace mmap, dma_buf_attach and driver private stuff like
virtio_dma_buf.c where you access your buffer with a uuid), so really not
a big deal.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
OK, will think more about that.</pre>
</blockquote>
<br>
Please note that we have follow_pfn() + mmu_notifier working for
KVM/XEN with MMIO mappings and P2P. And that required exactly zero
DMA-buf changes :)<br>
<br>
I don't fully understand your use case, but I think it's quite
likely that we already have that working.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:Z37QaIDUgiygLh74@yilunxu-OptiPlex-7050">
<pre class="moz-quote-pre" wrap="">
Thanks,
Yilun
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
And for non-contrived exporters we might be able to implement the other
access methods in terms of the pfn method generically, so this wouldn't
even be a terrible maintenance burden going forward. And meanwhile all the
contrived exporters just keep working as-is.
The other part is that cpu mmap is optional, and there's plenty of strange
exporters who don't implement. But you can dma map the attachment into
plenty devices. This tends to mostly be a thing on SoC devices with some
very funky memory. But I guess you don't care about these use-case, so
should be ok.
I couldn't come up with a good name for these pfn users, maybe
dma_buf_pfn_attachment? This does _not_ have a struct device, but maybe
some of these new p2p source specifiers (or a list of those which are
allowed, no idea how this would need to fit into the new dma api).
Cheers, Sima
--
Simona Vetter
Software Engineer, Intel Corporation
<a class="moz-txt-link-freetext" href="http://blog.ffwll.ch">http://blog.ffwll.ch</a>
</pre>
</blockquote>
</blockquote>
<br>
</body>
</html>