[RFC PATCH v2 00/11] Device Memory TCP

Christian König christian.koenig at amd.com
Fri Aug 11 11:02:50 UTC 2023


Am 10.08.23 um 20:44 schrieb Mina Almasry:
> On Thu, Aug 10, 2023 at 3:29 AM Christian König
> <christian.koenig at amd.com> wrote:
>> Am 10.08.23 um 03:57 schrieb Mina Almasry:
>>> Changes in RFC v2:
>>> ------------------
>>>
>>> The sticking point in RFC v1[1] was the dma-buf pages approach we used to
>>> deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
>>> that attempts to resolve this by implementing scatterlist support in the
>>> networking stack, such that we can import the dma-buf scatterlist
>>> directly.
>> Impressive work, I didn't thought that this would be possible that "easily".
>>
>> Please note that we have considered replacing scatterlists with simple
>> arrays of DMA-addresses in the DMA-buf framework to avoid people trying
>> to access the struct page inside the scatterlist.
>>
> FWIW, I'm not doing anything with the struct pages inside the
> scatterlist. All I need from the scatterlist are the
> sg_dma_address(sg) and the sg_dma_len(sg), and I'm guessing the array
> you're describing will provide exactly those, but let me know if I
> misunderstood.

Your understanding is perfectly correct.

>
>> It might be a good idea to push for that first before this here is
>> finally implemented.
>>
>> GPU drivers already convert the scatterlist used to arrays of
>> DMA-addresses as soon as they get them. This leaves RDMA and V4L as the
>> other two main users which would need to be converted.
>>
>>>    This is the approach proposed at a high level here[2].
>>>
>>> Detailed changes:
>>> 1. Replaced dma-buf pages approach with importing scatterlist into the
>>>      page pool.
>>> 2. Replace the dma-buf pages centric API with a netlink API.
>>> 3. Removed the TX path implementation - there is no issue with
>>>      implementing the TX path with scatterlist approach, but leaving
>>>      out the TX path makes it easier to review.
>>> 4. Functionality is tested with this proposal, but I have not conducted
>>>      perf testing yet. I'm not sure there are regressions, but I removed
>>>      perf claims from the cover letter until they can be re-confirmed.
>>> 5. Added Signed-off-by: contributors to the implementation.
>>> 6. Fixed some bugs with the RX path since RFC v1.
>>>
>>> Any feedback welcome, but specifically the biggest pending questions
>>> needing feedback IMO are:
>>>
>>> 1. Feedback on the scatterlist-based approach in general.
>> As far as I can see this sounds like the right thing to do in general.
>>
>> Question is rather if we should stick with scatterlist, use array of
>> DMA-addresses or maybe even come up with a completely new structure.
>>
> As far as I can tell, it should be trivial to switch this device
> memory TCP implementation to anything that provides:
>
> 1. DMA-addresses (sg_dma_address() equivalent)
> 2. lengths (sg_dma_len() equivalent)
>
> if you go that route. Specifically, I think it will be pretty much a
> localized change to netdev_bind_dmabuf_to_queue() implemented in this
> patch:
> https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@ziepe.ca/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f

Thanks, that's exactly what I wanted to hear.

>
>>> 2. Netlink API (Patch 1 & 2).
>> How does netlink manage the lifetime of objects?
>>
> Netlink itself doesn't handle the lifetime of the binding. However,
> the API I implemented unbinds the dma-buf when the netlink socket is
> destroyed. I do this so that even if the user process crashes or
> forgets to unbind, the dma-buf will still be unbound once the netlink
> socket is closed on the process exit. Details in this patch:
> https://lore.kernel.org/netdev/ZNULIDzuVVyfyMq2@ziepe.ca/T/#m2d344b08f54562cc9155c3f5b018cbfaed96036f

I need to double check the details, but at least of hand that sounds 
sufficient for the lifetime requirements of DMA-buf.

Thanks,
Christian.

>
> On Thu, Aug 10, 2023 at 9:07 AM Jason Gunthorpe <jgg at ziepe.ca> wrote:
>> On Thu, Aug 10, 2023 at 12:29:08PM +0200, Christian König wrote:
>>> Am 10.08.23 um 03:57 schrieb Mina Almasry:
>>>> Changes in RFC v2:
>>>> ------------------
>>>>
>>>> The sticking point in RFC v1[1] was the dma-buf pages approach we used to
>>>> deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
>>>> that attempts to resolve this by implementing scatterlist support in the
>>>> networking stack, such that we can import the dma-buf scatterlist
>>>> directly.
>>> Impressive work, I didn't thought that this would be possible that "easily".
>>>
>>> Please note that we have considered replacing scatterlists with simple
>>> arrays of DMA-addresses in the DMA-buf framework to avoid people trying to
>>> access the struct page inside the scatterlist.
>>>
>>> It might be a good idea to push for that first before this here is finally
>>> implemented.
>>>
>>> GPU drivers already convert the scatterlist used to arrays of DMA-addresses
>>> as soon as they get them. This leaves RDMA and V4L as the other two main
>>> users which would need to be converted.
>> Oh that would be a nightmare for RDMA.
>>
>> We need a standard based way to have scalable lists of DMA addresses :(
>>
>>>> 2. Netlink API (Patch 1 & 2).
>>> How does netlink manage the lifetime of objects?
>> And access control..
>>
> Someone will correct me if I'm wrong but I'm not sure netlink itself
> will do (sufficient) access control. However I meant for the netlink
> API to bind dma-bufs to be a CAP_NET_ADMIN API, and I forgot to add
> this check in this proof-of-concept, sorry. I'll add a CAP_NET_ADMIN
> check in netdev_bind_dmabuf_to_queue() in the next iteration.



More information about the dri-devel mailing list