[PATCH] client: Allow send error recovery without an abort

Pekka Paalanen ppaalanen at gmail.com
Tue Jun 19 12:43:43 UTC 2018


On Mon, 18 Jun 2018 19:54:23 -0700
Lloyd Pique <lpique at google.com> wrote:

> Let me take things back a step. I was a bit too hasty in suggesting
> something that would work for me for the fact that MAX_FDS_OUT is small. In
> our client the buffer creation ends up being serialized, and so only one
> thread will be creating buffers with the linux-dmabuf protocol at a time.
> 
> That might not be true for other clients, and so checking if the fd buffer
> is almost full won't work. In fact I'm not so certain checking if it is
> half full would work either... having more than 14 threads trying to create
> such buffers at the same time does not seem unreasonable, and if more than
> one fd were involved in each request (multi-plane formats), the number of
> threads needed to hit the wl_abort anyway drops quickly. And there may be
> some other protocol I'm not aware of that is worse.
> 
> Increasing the fd buffer would alleviate that. However it would introduce
> another problem.

Hi,

as I explained in my other email, the fd buffer can already hold 1k
fds. The problem in wl_connection_flush() seems to be that it runs out
of message data to flush out all the fds, so the fds can just keep
piling up. That seems like a problem worth solving on its own right.

Once that is solved, thresholding on the message data buffer
(soft-buffer) should be enough to keep the fd buffer never growing out
of bounds. Maybe then this would work:

- ABI to query a "flush recommended" flag; This flag would be set when
  the soft-buffer is at least half-full, and cleared when it drops
  to... below half? empty?

- When a client is doing lots of request sending without returning to
  its main loop which would call wl_display_flush() anyway, it can
  query the flag to see if it needs to flush.

- If flush ever fails, stop all request sending, poll for writable and
  try again. How to do this is left for the application. Most
  importantly, the application could set some state and return to its
  main event loop to do other stuff in the mean while.

You're right that this wouldn't help an application that sends requests
from multiple threads a lot. They would need to be checking the flag
practically for every few requests, but at least that would be cheaper
than calling wl_display_flush() outright.

> We also have wl_abort()'s from the call to wl_os_dupfd_cloexec() in
> wl_closure_marshal() failing. Not very often, and we are willing to pass on
> finding a fix for it for now, but increasing the number of fd's being held
> for the transfer is definitely going to make that worse.

True. Can you think of any way to recover from dupfd failure without
disconnecting?

The very least it could be made to disconnect instead of abort.

> Perhaps making the writes be blocking is the only reasonable way after all?
> 
> What do you think?

In my mind that would be a huge regression, so I wouldn't like it. If
we exhaust all other options, we could see about that.


Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180619/63c5e182/attachment-0001.sig>


More information about the wayland-devel mailing list