[PATCH wayland 8/8] client: Use read preparation API in wl_display_dispatch_queue()

Thu Oct 8 06:29:03 PDT 2015

On Fri,  2 Oct 2015 17:32:59 +0800
Jonas Ådahl <jadahl at gmail.com> wrote:

> Instead of doing things that do the equivalent of using
> wl_display_prepare_read() and friends, just use the public API. The
> only semantical difference is that we will now unlock and lock the mutex
> more times compared to before.
> 
> Signed-off-by: Jonas Ådahl <jadahl at gmail.com>
> ---
> 
> There is no real need to merge this patch, it only makes it easier to
> understand that the _dispatch(_queue)() functions are compatible with the
> prepare_read() family.

Hi Jonas,

I think this would be a good improvement on maintainability. However, I
don't think this patch is correct. It also points out a flaw in
wl_display_flush().

>  src/wayland-client.c | 35 ++++++++---------------------------
>  1 file changed, 8 insertions(+), 27 deletions(-)
> 
> diff --git a/src/wayland-client.c b/src/wayland-client.c
> index ca2495f..8f53f6f 100644
> --- a/src/wayland-client.c
> +++ b/src/wayland-client.c
> @@ -1507,15 +1507,9 @@ wl_display_dispatch_queue(struct wl_display *display,
>  	struct pollfd pfd[2];
>  	int ret;
>  
> -	pthread_mutex_lock(&display->mutex);
> -
> -	ret = dispatch_queue(display, queue);
> -	if (ret == -1)
> -		goto err_unlock;
> -	if (ret > 0) {
> -		pthread_mutex_unlock(&display->mutex);
> +	ret = wl_display_dispatch_queue_pending(display, queue);
> +	if (ret != 0)
>  		return ret;
> -	}

As the mutex is not locked and reader_count has not been bumped yet,
another thread may be queueing events in our queue. We could lose that
race.

>  
>  	/* We ignore EPIPE here, so that we try to read events before
>  	 * returning an error.  When the compositor sends an error it
> @@ -1524,12 +1518,11 @@ wl_display_dispatch_queue(struct wl_display *display,
>  	ret = wl_connection_flush(display->connection);
>  	if (ret < 0 && errno != EAGAIN && errno != EPIPE) {

I believe wl_connection_flush() requires the display to be locked,
which it no longer is. This should be replaced with a call to
wl_display_flush().

But, wl_display_flush() will raise a fatal error on EPIPE, which makes
the wl_display irreversibly unusable.

I think this raises an issue with wl_display_flush(), the same what the
comment about EPIPE talks about. If an error has been raised on the
wl_display, it won't be possible to read events or dispatch queues at
all anymore. The comment implies that this causes prepare-read API
using clients to be often incapable of reporting error events.

I believe the race is lost, when the client is busy doing something
instead of waiting in poll(), and the compositor sends an error. When
the client returns to its main loop and prepares to poll() again,
wl_display_flush() will get EPIPE.

I think this would be a nice subject for a new test (since the existing
test suite on neither Wayland nor Weston fails with this patch) that
opens a connetion, sends a request that causes a protocol error, sleeps
a bit to let the server react, and then resumes the normal event loop.
The test would be checking that it gets the correct error event
delivered.

I did a quick test:

I had this patch applied in wayland, and I hacked the EPIPE special
case away. In Weston, I hacked the bad-buffer as follows:

	wl_surface_attach(surface, bad_buffer, 0, 0);
 	wl_surface_damage(surface, 0, 0, 200, 200);
 	frame_callback_set(surface, &frame);
 	wl_surface_commit(surface);
+	wl_display_flush(client->wl_display);
+	sleep(1);
+	wl_display_sync(client->wl_display);
 	frame_callback_wait_nofail(client, &frame);

 	expect_protocol_error(client, &wl_buffer_interface,
 			      WL_SHM_ERROR_INVALID_FD);

Flush ensures the bad request gets sent. Sleep lets the compositor
reply. Then we need to have something in the send buffer, so I used
wl_display_sync. Finally, frame_callback_wait_nofail() calls
wl_display_dispatch(). The test fails because the expected error is
never dispatched.

I believe this case is even more confusing to a person debugging a
protocol error, because I don't think WAYLAND_DEBUG=client will show
the error event. Debug printing happens during dispatch.

If I put the EPIPE special case back, the test succeeds again.

>  		display_fatal_error(display, errno);
> -		goto err_unlock;
> +		return -1;
>  	}
>  
> -	display->reader_count++;
> -
> -	pthread_mutex_unlock(&display->mutex);
> +	if (wl_display_prepare_read_queue(display, queue) != 0)
> +		return -1;

This changes the behaviour of this function. Previously, when the mutex
was locked over both dispatch_queue() and reader_count++, here was no
race failure mode.

Instead, we should really have the loop as documented in
wl_display_prepare_read_queue(), in full with all error checking. This
will actually give us a proper, simple, working example of how to
use the prepare-read API correctly. We've been lacking that for too
long.

>  
>  	pfd[0].fd = display->fd;
>  	pfd[0].events = POLLIN;
> @@ -1542,22 +1535,10 @@ wl_display_dispatch_queue(struct wl_display *display,
>  		return -1;
>  	}
>  
> -	pthread_mutex_lock(&display->mutex);
> -
> -	if (read_events(display) == -1)
> -		goto err_unlock;
> -
> -	ret = dispatch_queue(display, queue);
> -	if (ret == -1)
> -		goto err_unlock;
> -
> -	pthread_mutex_unlock(&display->mutex);
> -
> -	return ret;
> +	if (wl_display_read_events(display) == -1)
> +		return -1;
>  
> - err_unlock:
> -	pthread_mutex_unlock(&display->mutex);
> -	return -1;
> +	return wl_display_dispatch_queue_pending(display, queue);
>  }
>  
>  /** Dispatch pending events in an event queue

The rest looks fine to me.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20151008/5038c6d3/attachment.sig>