[RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

Christian König christian.koenig at amd.com
Thu Jan 9 08:09:46 UTC 2025


Answering on my reply once more as pure text mail.

I love AMDs mail servers.

Cheers,
Christian.

Am 09.01.25 um 09:04 schrieb Christian König:
> Am 08.01.25 um 20:22 schrieb Xu Yilun:
>> On Wed, Jan 08, 2025 at 07:44:54PM +0100, Simona Vetter wrote:
>>> On Wed, Jan 08, 2025 at 12:22:27PM -0400, Jason Gunthorpe wrote:
>>>> On Wed, Jan 08, 2025 at 04:25:54PM +0100, Christian König wrote:
>>>>> Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
>>>>>> I have imagined a staged approach were DMABUF gets a new API that
>>>>>> works with the new DMA API to do importer mapping with "P2P source
>>>>>> information" and a gradual conversion.
>>>>> To make it clear as maintainer of that subsystem I would reject such a step
>>>>> with all I have.
>>>> This is unexpected, so you want to just leave dmabuf broken? Do you
>>>> have any plan to fix it, to fix the misuse of the DMA API, and all
>>>> the problems I listed below? This is a big deal, it is causing real
>>>> problems today.
>>>>
>>>> If it going to be like this I think we will stop trying to use dmabuf
>>>> and do something simpler for vfio/kvm/iommufd :(
>>> As the gal who help edit the og dma-buf spec 13 years ago, I think adding
>>> pfn isn't a terrible idea. By design, dma-buf is the "everything is
>>> optional" interface. And in the beginning, even consistent locking was
>>> optional, but we've managed to fix that by now :-/
>
> Well you were also the person who mangled the struct page pointers in 
> the scatterlist because people were abusing this and getting a bloody 
> nose :)
>
>>> Where I do agree with Christian is that stuffing pfn support into the
>>> dma_buf_attachment interfaces feels a bit much wrong.
>> So it could a dmabuf interface like mmap/vmap()? I was also wondering
>> about that. But finally I start to use dma_buf_attachment interface
>> because of leveraging existing buffer pin and move_notify.
>
> Exactly that's the point, sharing pfn doesn't work with the pin and 
> move_notify interfaces because of the MMU notifier approach Sima 
> mentioned.
>
>>>>> We have already gone down that road and it didn't worked at all and
>>>>> was a really big pain to pull people back from it.
>>>> Nobody has really seriously tried to improve the DMA API before, so I
>>>> don't think this is true at all.
>>> Aside, I really hope this finally happens!
>
> Sorry my fault. I was not talking about the DMA API, but rather that 
> people tried to look behind the curtain of DMA-buf backing stores.
>
> In other words all the fun we had with scatterlists and that people 
> try to modify the struct pages inside of them.
>
> Improving the DMA API is something I really really hope for as well.
>
>>>>>> 3) Importing devices need to know if they are working with PCI P2P
>>>>>> addresses during mapping because they need to do things like turn on
>>>>>> ATS on their DMA. As for multi-path we have the same hacks inside mlx5
>>>>>> today that assume DMABUFs are always P2P because we cannot determine
>>>>>> if things are P2P or not after being DMA mapped.
>>>>> Why would you need ATS on PCI P2P and not for system memory accesses?
>>>> ATS has a significant performance cost. It is mandatory for PCI P2P,
>>>> but ideally should be avoided for CPU memory.
>>> Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p
>>> stuff a bit I guess ...
>
> Hui? Why should ATS be mandatory for PCI P2P?
>
> We have tons of production systems using PCI P2P without ATS. And it's 
> the first time I hear that.
>
>>>>>> 5) iommufd and kvm are both using CPU addresses without DMA. No
>>>>>> exporter mapping is possible
>>>>> We have customers using both KVM and XEN with DMA-buf, so I can clearly
>>>>> confirm that this isn't true.
>>>> Today they are mmaping the dma-buf into a VMA and then using KVM's
>>>> follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable
>>>> dma-buf must have a CPU PFN.
>>>>
>>>> Here Xu implements basically the same path, except without the VMA
>>>> indirection, and it suddenly not OK? Illogical.
>>> So the big difference is that for follow_pfn() you need mmu_notifier since
>>> the mmap might move around, whereas with pfn smashed into
>>> dma_buf_attachment you need dma_resv_lock rules, and the move_notify
>>> callback if you go dynamic.
>>>
>>> So I guess my first question is, which locking rules do you want here for
>>> pfn importers?
>> follow_pfn() is unwanted for private MMIO, so dma_resv_lock.
>
> As Sima explained you either have follow_pfn() and mmu_notifier or you 
> have DMA addresses and dma_resv lock / dma_fence.
>
> Just giving out PFNs without some lifetime associated with them is one 
> of the major problems we faced before and really not something you can do.
>
>>> If mmu notifiers is fine, then I think the current approach of follow_pfn
>>> should be ok. But if you instead dma_resv_lock rules (or the cpu mmap
>>> somehow is an issue itself), then I think the clean design is create a new
>> cpu mmap() is an issue, this series is aimed to eliminate userspace
>> mapping for private MMIO resources.
>
> Why?
>
>>> separate access mechanism just for that. It would be the 5th or so (kernel
>>> vmap, userspace mmap, dma_buf_attach and driver private stuff like
>>> virtio_dma_buf.c where you access your buffer with a uuid), so really not
>>> a big deal.
>> OK, will think more about that.
>
> Please note that we have follow_pfn() + mmu_notifier working for 
> KVM/XEN with MMIO mappings and P2P. And that required exactly zero 
> DMA-buf changes :)
>
> I don't fully understand your use case, but I think it's quite likely 
> that we already have that working.
>
> Regards,
> Christian.
>
>> Thanks,
>> Yilun
>>
>>> And for non-contrived exporters we might be able to implement the other
>>> access methods in terms of the pfn method generically, so this wouldn't
>>> even be a terrible maintenance burden going forward. And meanwhile all the
>>> contrived exporters just keep working as-is.
>>>
>>> The other part is that cpu mmap is optional, and there's plenty of strange
>>> exporters who don't implement. But you can dma map the attachment into
>>> plenty devices. This tends to mostly be a thing on SoC devices with some
>>> very funky memory. But I guess you don't care about these use-case, so
>>> should be ok.
>>>
>>> I couldn't come up with a good name for these pfn users, maybe
>>> dma_buf_pfn_attachment? This does _not_ have a struct device, but maybe
>>> some of these new p2p source specifiers (or a list of those which are
>>> allowed, no idea how this would need to fit into the new dma api).
>>>
>>> Cheers, Sima
>>> -- 
>>> Simona Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>



More information about the dri-devel mailing list