[PATCH wayland 8/8] client: Use read preparation API in wl_display_dispatch_queue()
Pekka Paalanen
ppaalanen at gmail.com
Fri Oct 9 02:50:24 PDT 2015
On Fri, 9 Oct 2015 16:42:04 +0800
Jonas Ådahl <jadahl at gmail.com> wrote:
> On Thu, Oct 08, 2015 at 04:29:03PM +0300, Pekka Paalanen wrote:
> > On Fri, 2 Oct 2015 17:32:59 +0800
> > Jonas Ådahl <jadahl at gmail.com> wrote:
> >
> > > Instead of doing things that do the equivalent of using
> > > wl_display_prepare_read() and friends, just use the public API. The
> > > only semantical difference is that we will now unlock and lock the mutex
> > > more times compared to before.
> > >
> > > Signed-off-by: Jonas Ådahl <jadahl at gmail.com>
> > > ---
> > >
> > > There is no real need to merge this patch, it only makes it easier to
> > > understand that the _dispatch(_queue)() functions are compatible with the
> > > prepare_read() family.
> >
> > Hi Jonas,
> >
> > I think this would be a good improvement on maintainability. However, I
> > don't think this patch is correct. It also points out a flaw in
> > wl_display_flush().
> >
> > > src/wayland-client.c | 35 ++++++++---------------------------
> > > 1 file changed, 8 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/src/wayland-client.c b/src/wayland-client.c
> > > index ca2495f..8f53f6f 100644
> > > --- a/src/wayland-client.c
> > > +++ b/src/wayland-client.c
> > > @@ -1507,15 +1507,9 @@ wl_display_dispatch_queue(struct wl_display *display,
> > > struct pollfd pfd[2];
> > > int ret;
> > >
> > > - pthread_mutex_lock(&display->mutex);
> > > -
> > > - ret = dispatch_queue(display, queue);
> > > - if (ret == -1)
> > > - goto err_unlock;
> > > - if (ret > 0) {
> > > - pthread_mutex_unlock(&display->mutex);
> > > + ret = wl_display_dispatch_queue_pending(display, queue);
> > > + if (ret != 0)
> > > return ret;
> > > - }
> >
> > As the mutex is not locked and reader_count has not been bumped yet,
> > another thread may be queueing events in our queue. We could lose that
> > race.
>
> Will comment on this further below.
>
> >
> > >
> > > /* We ignore EPIPE here, so that we try to read events before
> > > * returning an error. When the compositor sends an error it
> > > @@ -1524,12 +1518,11 @@ wl_display_dispatch_queue(struct wl_display *display,
> > > ret = wl_connection_flush(display->connection);
> > > if (ret < 0 && errno != EAGAIN && errno != EPIPE) {
> >
> > I believe wl_connection_flush() requires the display to be locked,
> > which it no longer is. This should be replaced with a call to
> > wl_display_flush().
> >
> > But, wl_display_flush() will raise a fatal error on EPIPE, which makes
> > the wl_display irreversibly unusable.
> >
> > I think this raises an issue with wl_display_flush(), the same what the
> > comment about EPIPE talks about. If an error has been raised on the
> > wl_display, it won't be possible to read events or dispatch queues at
> > all anymore. The comment implies that this causes prepare-read API
> > using clients to be often incapable of reporting error events.
> >
> > I believe the race is lost, when the client is busy doing something
> > instead of waiting in poll(), and the compositor sends an error. When
> > the client returns to its main loop and prepares to poll() again,
> > wl_display_flush() will get EPIPE.
> >
> > I think this would be a nice subject for a new test (since the existing
> > test suite on neither Wayland nor Weston fails with this patch) that
> > opens a connetion, sends a request that causes a protocol error, sleeps
> > a bit to let the server react, and then resumes the normal event loop.
> > The test would be checking that it gets the correct error event
> > delivered.
> >
> > I did a quick test:
> >
> > I had this patch applied in wayland, and I hacked the EPIPE special
> > case away. In Weston, I hacked the bad-buffer as follows:
> >
> > wl_surface_attach(surface, bad_buffer, 0, 0);
> > wl_surface_damage(surface, 0, 0, 200, 200);
> > frame_callback_set(surface, &frame);
> > wl_surface_commit(surface);
> > + wl_display_flush(client->wl_display);
> > + sleep(1);
> > + wl_display_sync(client->wl_display);
> > frame_callback_wait_nofail(client, &frame);
> >
> > expect_protocol_error(client, &wl_buffer_interface,
> > WL_SHM_ERROR_INVALID_FD);
> >
> > Flush ensures the bad request gets sent. Sleep lets the compositor
> > reply. Then we need to have something in the send buffer, so I used
> > wl_display_sync. Finally, frame_callback_wait_nofail() calls
> > wl_display_dispatch(). The test fails because the expected error is
> > never dispatched.
> >
> > I believe this case is even more confusing to a person debugging a
> > protocol error, because I don't think WAYLAND_DEBUG=client will show
> > the error event. Debug printing happens during dispatch.
> >
> > If I put the EPIPE special case back, the test succeeds again.
>
> For the purpose of documenting the discussion we had on IRC regarding
> this; a potential solution to this is to change the behaviour of
> wl_display_flush() to not set the fatal display error on EPIPE. The idea
> is that we'd sooner or later set the fatal display error anyway when
> trying to read. It's a bit of a semantical change, but we might just get
> away with it, and without it, we wont be able to reliably read the error
> when using the read preparation API.
Yeah.
> >
> > > display_fatal_error(display, errno);
> > > - goto err_unlock;
> > > + return -1;
> > > }
> > >
> > > - display->reader_count++;
> > > -
> > > - pthread_mutex_unlock(&display->mutex);
> > > + if (wl_display_prepare_read_queue(display, queue) != 0)
> > > + return -1;
> >
> > This changes the behaviour of this function. Previously, when the mutex
> > was locked over both dispatch_queue() and reader_count++, here was no
> > race failure mode.
> >
> > Instead, we should really have the loop as documented in
> > wl_display_prepare_read_queue(), in full with all error checking. This
> > will actually give us a proper, simple, working example of how to
> > use the prepare-read API correctly. We've been lacking that for too
> > long.
>
> The problem with this is that the semantics of this function is
> different from the normal usage of the preparation API, because this
> function is expected to not block if there were already events on the
> queue. Not that it is possible for the client to know whether there are
> any though, so maybe it is fine.
Ah, yes indeed.
> I don't think it's even possible to implement the
> wl_display_dispatch(_queue)() semantics with the current read preparation
> API, because the intended usage (the loop dispatch_pending() until
> prepare_read() succeeds) always assumes we want to eventually read. We'd
> need something like an atomic "dispatch_pending_or_prepare_read()" to
> avoid this race without changing the semantics.
How about this:
if (wl_display_prepare_read_queue() < 0)
/* there is something in the queue, so dispatch once and return success */
return wl_display_dispatch_queue();
/* now we are registered to read, no-one can queue events until actual read */
wl_display_flush();
poll();
read...
Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20151009/97099d2a/attachment.sig>
More information about the wayland-devel
mailing list