[PATCH] server: Fix crash when accessing client which is already freed

Mon Nov 21 13:30:41 UTC 2016

On Mon, 21 Nov 2016 12:19:43 +0000
Daniel Stone <daniel at fooishbar.org> wrote:

> Hi Hyun Kook,
> 
> On 23 September 2016 at 00:40, Hyun Kook Khang <hyunkook.khang at lge.com>
> wrote:
> 
> > in wl_client_connection_data(),
> > there is no guarantee that client would never be destroyed in the process
> > of handling the pending input.
> >
> > For example,
> > 1. It comes to wl_client_connection_data()
> > 2. wl_closure_invoke() would be invoked, and then wl_display_flush_clients
> > would be invoked anyhow later in the same routine. (wl_closure_invoke() ->
> > ? -> wl_display_flush_clients())
> >     Here, client will be destroyed if wl_connection_flush() returns
> > negative value and errno is not EAGAIN.
> >  
> 
> Out of interest, what is the '?' here? I can't find any examples of this
> happening in Wayland, Weston, or Mesa. If you could share your usecase,
> this would be quite interesting. I believe the idea behind deferring the
> flush was to minimise the number of system calls made, and also somewhat
> decrease the latency when multiple clients are involved, by batching all
> the flushes (potentially time-consuming) to a point where we do not have
> any more requests to process (so are not sensitive to latency).
> 
> In general, Wayland compositors should always be responsive: client request
> handling should be lightweight and not time-consuming. Wayland clients
> should also be completely asynchronous, rather than following a
> RPC/method-return pattern of { send_request(); block_for_reply(); }. Given
> that, the only reason to flush inside request processing would be because
> you are sending a lot of data, but again the Wayland protocol itself should
> not be carrying large amounts of data. So I'm not sure what usecase there
> is for this.
> 
> 
> > 3. It will come back to the while loop in wl_client_connection_data().
> >     If wl_connection_pending_input() returns any values which is bigger
> > than “size of p”; it will go into the while loop again, and then it will do
> > something with the client which is already freed.
> >
> >
> > The simplest way to solve this is checking the source(wl_event_source)’
> > fd, but for now there is no way to access wl_event_source in
> > wayland-server.c. (wl_event_source will be freed a while later)
> >  
> 
> Hm, this is racy though.
> 
> I'm would suggest we document that invoking wl_display_flush_clients() /
> wl_connection_flush() from a request handler may cause undefined behaviour,
> and that it must only be called outside of dispatch.

Hi,

FWIW, I agree with Daniel here. There are several things that are not
quite defined or expected to be done inside dispatch calls, and
flushing all clients is one of them.

I could see, IIRC, how sending lots of events from inside dispatch
could cause the one client to be flushed. I think flush is done
automatically if the send buffer fills up. If the flush at that point
fails, there is no way to signal back that "oops, the compositor should
put that thing aside and do something else" until the connection
unblocks. We don't want to be checking and handling every send-event
possibly failing, so killing the client can happen. ISTR we don't
dynamically grow the send buffers, do we? But even the growing would
need to stop somewhere.

Before we can review a fix, we need to understand exactly when and why
the problem happens.

Is this a case of the compositor explicitly calling flush, or is it an
automatic flush? What triggers it?

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20161121/69300964/attachment.sig>