[PATCH wayland 8/8] client: Use read preparation API in wl_display_dispatch_queue()

Fri Oct 9 02:50:24 PDT 2015

On Fri, 9 Oct 2015 16:42:04 +0800
Jonas Ådahl <jadahl at gmail.com> wrote:

> On Thu, Oct 08, 2015 at 04:29:03PM +0300, Pekka Paalanen wrote:
> > On Fri,  2 Oct 2015 17:32:59 +0800
> > Jonas Ådahl <jadahl at gmail.com> wrote:
> > 
> > > Instead of doing things that do the equivalent of using
> > > wl_display_prepare_read() and friends, just use the public API. The
> > > only semantical difference is that we will now unlock and lock the mutex
> > > more times compared to before.
> > > 
> > > Signed-off-by: Jonas Ådahl <jadahl at gmail.com>
> > > ---
> > > 
> > > There is no real need to merge this patch, it only makes it easier to
> > > understand that the _dispatch(_queue)() functions are compatible with the
> > > prepare_read() family.
> > 
> > Hi Jonas,
> > 
> > I think this would be a good improvement on maintainability. However, I
> > don't think this patch is correct. It also points out a flaw in
> > wl_display_flush().
> > 
> > >  src/wayland-client.c | 35 ++++++++---------------------------
> > >  1 file changed, 8 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/src/wayland-client.c b/src/wayland-client.c
> > > index ca2495f..8f53f6f 100644
> > > --- a/src/wayland-client.c
> > > +++ b/src/wayland-client.c
> > > @@ -1507,15 +1507,9 @@ wl_display_dispatch_queue(struct wl_display *display,
> > >  	struct pollfd pfd[2];
> > >  	int ret;
> > >  
> > > -	pthread_mutex_lock(&display->mutex);
> > > -
> > > -	ret = dispatch_queue(display, queue);
> > > -	if (ret == -1)
> > > -		goto err_unlock;
> > > -	if (ret > 0) {
> > > -		pthread_mutex_unlock(&display->mutex);
> > > +	ret = wl_display_dispatch_queue_pending(display, queue);
> > > +	if (ret != 0)
> > >  		return ret;
> > > -	}
> > 
> > As the mutex is not locked and reader_count has not been bumped yet,
> > another thread may be queueing events in our queue. We could lose that
> > race.
> 
> Will comment on this further below.
> 
> > 
> > >  
> > >  	/* We ignore EPIPE here, so that we try to read events before
> > >  	 * returning an error.  When the compositor sends an error it
> > > @@ -1524,12 +1518,11 @@ wl_display_dispatch_queue(struct wl_display *display,
> > >  	ret = wl_connection_flush(display->connection);
> > >  	if (ret < 0 && errno != EAGAIN && errno != EPIPE) {
> > 
> > I believe wl_connection_flush() requires the display to be locked,
> > which it no longer is. This should be replaced with a call to
> > wl_display_flush().
> > 
> > But, wl_display_flush() will raise a fatal error on EPIPE, which makes
> > the wl_display irreversibly unusable.
> > 
> > I think this raises an issue with wl_display_flush(), the same what the
> > comment about EPIPE talks about. If an error has been raised on the
> > wl_display, it won't be possible to read events or dispatch queues at
> > all anymore. The comment implies that this causes prepare-read API
> > using clients to be often incapable of reporting error events.
> > 
> > I believe the race is lost, when the client is busy doing something
> > instead of waiting in poll(), and the compositor sends an error. When
> > the client returns to its main loop and prepares to poll() again,
> > wl_display_flush() will get EPIPE.
> > 
> > I think this would be a nice subject for a new test (since the existing
> > test suite on neither Wayland nor Weston fails with this patch) that
> > opens a connetion, sends a request that causes a protocol error, sleeps
> > a bit to let the server react, and then resumes the normal event loop.
> > The test would be checking that it gets the correct error event
> > delivered.
> > 
> > I did a quick test:
> > 
> > I had this patch applied in wayland, and I hacked the EPIPE special
> > case away. In Weston, I hacked the bad-buffer as follows:
> > 
> > 	wl_surface_attach(surface, bad_buffer, 0, 0);
> >  	wl_surface_damage(surface, 0, 0, 200, 200);
> >  	frame_callback_set(surface, &frame);
> >  	wl_surface_commit(surface);
> > +	wl_display_flush(client->wl_display);
> > +	sleep(1);
> > +	wl_display_sync(client->wl_display);
> >  	frame_callback_wait_nofail(client, &frame);
> >  
> >  	expect_protocol_error(client, &wl_buffer_interface,
> >  			      WL_SHM_ERROR_INVALID_FD);
> > 
> > Flush ensures the bad request gets sent. Sleep lets the compositor
> > reply. Then we need to have something in the send buffer, so I used
> > wl_display_sync. Finally, frame_callback_wait_nofail() calls
> > wl_display_dispatch(). The test fails because the expected error is
> > never dispatched.
> > 
> > I believe this case is even more confusing to a person debugging a
> > protocol error, because I don't think WAYLAND_DEBUG=client will show
> > the error event. Debug printing happens during dispatch.
> > 
> > If I put the EPIPE special case back, the test succeeds again.
> 
> For the purpose of documenting the discussion we had on IRC regarding
> this; a potential solution to this is to change the behaviour of
> wl_display_flush() to not set the fatal display error on EPIPE. The idea
> is that we'd sooner or later set the fatal display error anyway when
> trying to read. It's a bit of a semantical change, but we might just get
> away with it, and without it, we wont be able to reliably read the error
> when using the read preparation API.

Yeah.

> > 
> > >  		display_fatal_error(display, errno);
> > > -		goto err_unlock;
> > > +		return -1;
> > >  	}
> > >  
> > > -	display->reader_count++;
> > > -
> > > -	pthread_mutex_unlock(&display->mutex);
> > > +	if (wl_display_prepare_read_queue(display, queue) != 0)
> > > +		return -1;
> > 
> > This changes the behaviour of this function. Previously, when the mutex
> > was locked over both dispatch_queue() and reader_count++, here was no
> > race failure mode.
> > 
> > Instead, we should really have the loop as documented in
> > wl_display_prepare_read_queue(), in full with all error checking. This
> > will actually give us a proper, simple, working example of how to
> > use the prepare-read API correctly. We've been lacking that for too
> > long.
> 
> The problem with this is that the semantics of this function is
> different from the normal usage of the preparation API, because this
> function is expected to not block if there were already events on the
> queue. Not that it is possible for the client to know whether there are
> any though, so maybe it is fine.

Ah, yes indeed.

> I don't think it's even possible to implement the
> wl_display_dispatch(_queue)() semantics with the current read preparation
> API, because the intended usage (the loop dispatch_pending() until
> prepare_read() succeeds) always assumes we want to eventually read. We'd
> need something like an atomic "dispatch_pending_or_prepare_read()" to
> avoid this race without changing the semantics.

How about this:

if (wl_display_prepare_read_queue() < 0)
	/* there is something in the queue, so dispatch once and return success */
	return wl_display_dispatch_queue();

/* now we are registered to read, no-one can queue events until actual read */
wl_display_flush();
poll();
read...

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20151009/97099d2a/attachment.sig>