[RFC PATCH v3 10/12] tcp: RX path for devmem TCP

Pavel Begunkov asml.silence at gmail.com
Fri Dec 8 20:28:15 UTC 2023


On 11/6/23 22:55, Willem de Bruijn wrote:
> On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev <sdf at google.com> wrote:
>>
>> On 11/06, Willem de Bruijn wrote:
>>>>> IMHO, we need a better UAPI to receive the tokens and give them back to
>>>>> the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done,
>>>>> but look dated and hacky :-(
>>>>>
>>>>> We should either do some kind of user/kernel shared memory queue to
>>>>> receive/return the tokens (similar to what Jonathan was doing in his
>>>>> proposal?)
>>>>
>>>> I'll take a look at Jonathan's proposal, sorry, I'm not immediately
>>>> familiar but I wanted to respond :-) But is the suggestion here to
>>>> build a new kernel-user communication channel primitive for the
>>>> purpose of passing the information in the devmem cmsg? IMHO that seems
>>>> like an overkill. Why add 100-200 lines of code to the kernel to add
>>>> something that can already be done with existing primitives? I don't
>>>> see anything concretely wrong with cmsg & setsockopt approach, and if
>>>> we switch to something I'd prefer to switch to an existing primitive
>>>> for simplicity?
>>>>
>>>> The only other existing primitive to pass data outside of the linear
>>>> buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that
>>>> preferred? Any other suggestions or existing primitives I'm not aware
>>>> of?
>>>>
>>>>> or bite the bullet and switch to io_uring.
>>>>>
>>>>
>>>> IMO io_uring & socket support are orthogonal, and one doesn't preclude
>>>> the other. As you know we like to use sockets and I believe there are
>>>> issues with io_uring adoption at Google that I'm not familiar with
>>>> (and could be wrong). I'm interested in exploring io_uring support as
>>>> a follow up but I think David Wei will be interested in io_uring
>>>> support as well anyway.
>>>
>>> I also disagree that we need to replace a standard socket interface
>>> with something "faster", in quotes.
>>>
>>> This interface is not the bottleneck to the target workload.
>>>
>>> Replacing the synchronous sockets interface with something more
>>> performant for workloads where it is, is an orthogonal challenge.
>>> However we do that, I think that traditional sockets should continue
>>> to be supported.
>>>
>>> The feature may already even work with io_uring, as both recvmsg with
>>> cmsg and setsockopt have io_uring support now.
>>
>> I'm not really concerned with faster. I would prefer something cleaner :-)
>>
>> Or maybe we should just have it documented. With some kind of path
>> towards beautiful world where we can create dynamic queues..
> 
> I suppose we just disagree on the elegance of the API.
> 
> The concise notification API returns tokens as a range for
> compression, encoding as two 32-bit unsigned integers start + length.
> It allows for even further batching by returning multiple such ranges
> in a single call.

FWIW, nothing prevents io_uring from compressing ranges. The io_uring
zc RFC returns {offset, size} as well, though at the moment the would
lie in the same page.

> This is analogous to the MSG_ZEROCOPY notification mechanism from
> kernel to user.
> 
> The synchronous socket syscall interface can be replaced by something
> asynchronous like io_uring. This already works today? Whatever

If you mean async io_uring recv, it does work. In short, internally
it polls the socket and then calls sock_recvmsg(). There is also a
feature that would make it return back to polling after sock_recvmsg()
and loop like this.

> asynchronous ring-based API would be selected, io_uring or otherwise,
> I think the concise notification encoding would remain as is.
> 
> Since this is an operation on a socket, I find a setsockopt the
> fitting interface.

-- 
Pavel Begunkov


More information about the dri-devel mailing list