[PATCH] client: Allow send error recovery without an abort

Lloyd Pique lpique at google.com
Tue Jun 19 19:10:36 UTC 2018


Hi Pekka,

On Tue, Jun 19, 2018 at 5:12 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:

> I think the aim should be to remove the abort(). I do not see any
> reason to leave it in even as an option.
>

Thank you. Will proceeed in that direction.


> The soft-buffer in libwayland-client is quite small, the kernel socket
> buffers are usually much larger in bytes. EAGAIN is triggered only when
> the flush to the kernel fails, so I don't think that would happen
> needlessly.
>

Ack.


> If I'm reading the code right, MAX_FDS_OUT is only a limit for a single
> sendmsg() call. The fds_out buffer is capable of holding 1k fds.
>

> We have 4kB message data buffer (what I've been calling the
> soft-buffer, as opposed to in-kernel buffers). A minimal request to
> carry an fd would be header (8B) plus fd (4B), so that's 12B. That
> makes at most 341 fds until the soft-buffer is full. That should well
> fit into the 1k fds that fds_out can hold.
>
> One could have more fds in a single request. Let's say four. The
> request size becomes 24B carrying 4 fds. That makes 680 fds for a full
> soft-buffer. Should still be fine.


The limit is both the maximum amount per sendmsg() call, as well as the
maximum amount that can be sent by a single send request.

The fd's are added once at a time by a call to wl_connection_put_fd(),
which forces a flush once the limit is reached. Those calls are made by
copy_fds_to_connection(), called once per message send from
wl_closure_send().

On a flush (forced or not), the fd's are closed by close_fds(), which then
advances the tail to the head, emptying the ring buffer again.

That the output fd butter is over-large at 4k just allows much of the ring
buffer code to be reused. It will never actually hold 1024 fds enqueued for
output though.

(The input fd buffer may allow 1k fd's. I did not examine that carefully to
be sure).

I think this limit is still a problem. I could change it to actually allow
1k fd's to be queued up, but then that will make the dup() more likely to
fail (more in the other reply).


> However, wl_connection_flush() attempts to write out at most
> MAX_FDS_OUT fds at a time, but all of the message data. OTOH, one
> cannot even try to flush out fds without any message data, IIRC
> sendmsg() doesn't support that. So it can run out of message data while
> fds accumulate and finally fill fds_out up, causing irrecoverable
> failure.
>
> Do your observations support this theory?
>

I haven't observed a problem case like that. Any normal individual request
will probably only use a few fd's, and I don't think there is a way for a
request to not carry some non-fd data.

I can imagine someone creating a new protocol message that does take more
than 28 fd's in a request, where that would then fail to flush if there
were no other normal message data also already queued up. But that seems a
very abnormal use. wl_closure_marshal() would be the best place for a check
(it already checks for null objects -- see "goto err_null"), and I didn't
find other bound check on the number of fd's in a request on a quick look
through the code.

But I'll study the code a bit more in case there is some subtlety that
escapes me.

It seems we should somehow flush less message data so that we can call
> sendmsg() enough times to handle all the fds and avoid them piling up.
> But if we do so, we must be very careful with any potential assumptions
> on receiving side on the ordering between message data and related fds.
>

I expect that the receiving code will just pull the next fd out of the
fds_in ring buffer when turning the messages back into wl_closures for each
message, as it realizes an fd is needed for any argument.

But I haven't looked at the code. Another thing I will check to be safe.


> Even the author of the client would usually not know. It would make a
> very hard API to use correctly.
>

Ok, I won't introduce anything like that.


> My objection is to the automatic blocking in general. We have a pretty
> nicely asynchronous protocol and we try to teach programmers to write
> their apps in a non-blocking way, so blocking inside libwayland-client
> is not nice. We have very few functions that are documented as
> potentially blocking. Extending that to all request sending functions
> is not acceptable in my mind.
>

I agree and appreciate that the core library is non-blocking, and really
don't want to change that, which is why my original patch made that
blocking at least opt-in, and even then NOT even part of the core client
code.

- Lloyd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180619/00a5c62f/attachment.html>


More information about the wayland-devel mailing list