[PATCH wayland 6/8] client: Correct documentation regarding thread safeness

Thu Oct 8 04:25:31 PDT 2015

On Fri,  2 Oct 2015 17:32:57 +0800
Jonas Ådahl <jadahl at gmail.com> wrote:

> The current documentation about wl_display_dispatch() states one may not
> mix wl_display_dispatch(_queue)() with wl_display_prepare_read() and
> friends, but this is a misconception about how
> wl_display_dispatch(_queue)() works. The fact is that the dispatch
> functions does the equivalent of what the preparation API does
> internally, and it is safe to use together.
> 
> What is not safe is to dispatch using the wl_display_dispatch(_queue)()
> functions while being prepared to read using wl_display_read_events().
> 
> This patch rewrites the documentation to correctly state when the
> various API's are thread safe and how they may not be used.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=91767
> 
> Signed-off-by: Jonas Ådahl <jadahl at gmail.com>
> ---
>  src/wayland-client.c | 161 +++++++++++++++++++--------------------------------
>  1 file changed, 59 insertions(+), 102 deletions(-)

Hi,

adding Marek Chalupa to CC. Comments below.

> 
> diff --git a/src/wayland-client.c b/src/wayland-client.c
> index c472f52..e43d19d 100644
> --- a/src/wayland-client.c
> +++ b/src/wayland-client.c
> @@ -977,14 +977,13 @@ static const struct wl_callback_listener sync_listener = {
>   * requests by sending a request to the display server and waiting for a
>   * reply before returning.
>   *
> + * This function uses wl_display_dispatch_queue() internally. It is not allowed
> + * to call this function while the thread is being prepared for reading events,
> + * and doing so will cause a dead lock.
> + *
>   * \note This function may dispatch other events being received on the given
>   * queue.
>   *
> - * \note This function uses wl_display_dispatch_queue() internally. If you
> - * are using wl_display_read_events() from more threads, don't use this function
> - * (or make sure that calling wl_display_roundtrip_queue() doesn't interfere
> - * with calling wl_display_prepare_read() and wl_display_read_events())
> - *
>   * \sa wl_display_roundtrip()
>   * \memberof wl_event_queue
>   */
> @@ -1018,14 +1017,13 @@ wl_display_roundtrip_queue(struct wl_display *display, struct wl_event_queue *qu
>   * requests by sending a request to the display server and waiting for a reply
>   * before returning.
>   *
> + * This function uses wl_display_dispatch_queue() internally. It is not allowed
> + * to call this function while the thread is being prepared for reading events,
> + * and doing so will cause a dead lock.
> + *
>   * \note This function may dispatch other events being received on the default
>   * queue.
>   *
> - * \note This function uses wl_display_dispatch_queue() internally. If you
> - * are using wl_display_read_events() from more threads, don't use this function
> - * (or make sure that calling wl_display_roundtrip() doesn't interfere
> - * with calling wl_display_prepare_read() and wl_display_read_events())
> - *
>   * \memberof wl_display
>   */
>  WL_EXPORT int
> @@ -1269,58 +1267,27 @@ cancel_read(struct wl_display *display)
>   * \return 0 on success or -1 on error.  In case of error errno will
>   * be set accordingly
>   *
> - * This will read events from the file descriptor for the display.
> - * This function does not dispatch events, it only reads and queues
> - * events into their corresponding event queues. If no data is
> - * available on the file descriptor, wl_display_read_events() returns
> - * immediately. To dispatch events that may have been queued, call
> - * wl_display_dispatch_pending() or wl_display_dispatch_queue_pending().
> - *
> - * Before calling this function, wl_display_prepare_read() must be
> - * called first. When running in more threads (which is the usual
> - * case, since we'd use wl_display_dispatch() otherwise), every thread
> - * must call wl_display_prepare_read() before calling this function.
> + * Calling this function will result in data available on the display file
> + * descriptor being read and read events will be queued on their corresponding
> + * event queues.
>   *
> - * After calling wl_display_prepare_read() there can be some extra code
> - * before calling wl_display_read_events(), for example poll() or alike.
> - * Example of code in a thread:
> - *
> - * \code
> + * Before calling this function, depending on what thread it is to be called
> + * from, wl_display_prepare_read_queue() or wl_display_prepare_read() needs to
> + * be called. See wl_display_prepare_read_queue() for more details.
>   *
> - *   while (wl_display_prepare_read(display) < 0)
> - *           wl_display_dispatch_pending(display);
> - *   wl_display_flush(display);
> + * When being called at a point where other threads have been prepared to read
> + * (using wl_display_prepare_read_queue() or wl_display_prepare_read()) this
> + * function will sleep until all other prepared threads have either been
> + * cancelled (using wl_display_cancel_read()) or them self entered this
> + * function. The last thread that calls this function will then read and queue
> + * events on their corresponding event queues, and finally wake up all other
> + * wl_display_read() calls causing them to return.

wl_display_read() should be wl_display_read_events(), right?

>   *
> - *   ... some code ...
> + * If the last reader thread was cancelled, some other reading thread will do
> + * the actual reading and queueing in the same way as described above.

This does not seem to be exactly true, not without all threads
returning to the event loops first.

When I read read_events(), if the last thread that causes reader_count
to go to zero cancelled instead of calling wl_display_read_events(),
then everyone will just exit wl_display_read_events() with no-one
having actually read.

If no-one ends up reading, I don't think it should cause any problems.
The event loop in the caller will just spin once more, the wayland fd
still polls as readable, and we should end up in
wl_display_read_events() soon again.

I have a hunch that fixing this to guarantee a read if at least one
thread does not cancel might end up ugly with all the corner cases of
multiple cancels and possible error conditions. I'm not sure it is
worth fixing even. Cancelling a read should be fairly exceptional
anyway.

We talked in IRC, pondering between documenting the current behaviour
vs. leaving this undocumented. I think I might lean on documenting the
current behaviour and not changing the behaviour.

>   *
> - *   fds[0].fd = wl_display_get_fd(display);
> - *   fds[0].events = POLLIN;
> - *   poll(fds, 1, -1);
> - *
> - *   if (!everything_ok()) {
> - *	wl_display_cancel_read(display);
> - *	handle_error();
> - *   }
> - *
> - *   if (wl_display_read_events(display) < 0)
> - *	handle_error();
> - *
> - *   ...
> - * \endcode
> - *
> - * After wl_display_prepare_read() succeeds, other threads that enter
> - * wl_display_read_events() will sleep until the very last thread enters
> - * it too or cancels. Therefore when the display fd becomes (or already
> - * is) readable, wl_display_read_events() should be called as soon as
> - * possible to unblock all threads. If wl_display_read_events() will not
> - * be called, then wl_display_cancel_read() must be called instead to let
> - * the other threads continue.
> - *
> - * This function must not be called simultaneously with wl_display_dispatch().
> - * It may lead to deadlock. If programmer wants, for some reason, use
> - * wl_display_dispatch() in one thread and wl_display_prepare_read() with
> - * wl_display_read_events() in another, extra care must be taken to serialize
> - * these calls, i. e. use mutexes or similar (on whole prepare + read sequence)
> + * To dispatch events that may have been queued, call
> + * wl_display_dispatch_pending() or wl_display_dispatch_queue_pending().
>   *
>   * \sa wl_display_prepare_read(), wl_display_cancel_read(),
>   * wl_display_dispatch_pending(), wl_display_dispatch()
> @@ -1422,28 +1389,6 @@ err:
>   * to wl_display_prepare_read_queue() fails, we dispatch the pending events and
>   * try again until we're successful.
>   *
> - * When using wl_display_dispatch() we'd have something like:
> - *
> - * \code
> - * wl_display_dispatch_pending(display);
> - * wl_display_flush(display);
> - * poll(fds, nfds, -1);
> - * wl_display_dispatch(display);
> - * \endcode
> - *
> - * This sequence in not thread-safe. The race is immediately after poll(),
> - * where one thread could preempt and read events before the other thread calls
> - * wl_display_dispatch(). This call now blocks and starves the other
> - * fds in the event loop.
> - *
> - * Another race would be when using more event queues.
> - * When one thread calls wl_display_dispatch(_queue)(), then it
> - * reads all events from display's fd and queues them in appropriate
> - * queues. Then it dispatches only its own queue and the other events
> - * are sitting in their queues, waiting for dispatching. If that happens
> - * before the other thread managed to call poll(), it will
> - * block with events queued.
> - *
>   * wl_display_prepare_read_queue() function doesn't acquire exclusive access to
>   * the display's fd. It only registers that the thread calling this function has
>   * has intention to read from fd. When all registered readers call
> @@ -1522,19 +1467,27 @@ wl_display_cancel_read(struct wl_display *display)
>   * \param queue The event queue to dispatch
>   * \return The number of dispatched events on success or -1 on failure
>   *
> - * Dispatch all incoming events for objects assigned to the given
> - * event queue. On failure -1 is returned and errno set appropriately.
> + * Dispatch events on the given event queue.
> + *
> + * If the given event queue is empty, this function blocks until there are
> + * events to be read from the display fd. Events are read and queued on
> + * the appropriate event queues. Finally, events on given event queue are
> + * dispatched. On failure -1 is returned and errno set appropriately.
>   *
> - * The behaviour of this function is exactly the same as the behaviour of
> - * wl_display_dispatch(), but it dispatches events on given queue,
> - * not on the default queue.
> + * In a multi threaded enviroment, do not manually wait using poll() (or
> + * equivalent) before calling this function, as doing so might cause a dead
> + * lock. If external reliance on poll() (or equivalent) is required, see
> + * wl_display_prepare_read_queue() of how to do so. \n\n

\n\n? :-)

>   *
> - * This function blocks if there are no events to dispatch (if there are,
> - * it only dispatches these events and returns immediately).
> - * When this function returns after blocking, it means that it read events
> - * from display's fd and queued them to appropriate queues.
> - * If among the incoming events were some events assigned to the given queue,
> - * they are dispatched by this moment.
> + * This function is thread safe as long as it dispatches the right queue on the
> + * right thread. It is also compatible with the multi thread event reading
> + * preparation API (see wl_display_prepare_read_queue()), and uses the
> + * equivalent functionality internally. It is not allowed to call this function
> + * while the thread is being prepared for reading events, and doing so will
> + * cause a dead lock.

"Will" deadlock here, "might" deadlock above. Very good.

> + *
> + * It can be used as a helper function to ease the procedure of reading and
> + * dispatching events.
>   *
>   * \note Since Wayland 1.5 the display has an extra queue
>   * for its own events (i. e. delete_id). This queue is dispatched always,
> @@ -1542,11 +1495,8 @@ wl_display_cancel_read(struct wl_display *display)
>   * That means that this function can return non-0 value even when it
>   * haven't dispatched any event for the given queue.
>   *
> - * This function has the same constrains for using in multi-threaded apps
> - * as \ref wl_display_dispatch().
> - *
>   * \sa wl_display_dispatch(), wl_display_dispatch_pending(),
> - * wl_display_dispatch_queue_pending()
> + * wl_display_dispatch_queue_pending(), wl_display_prepare_read_queue()
>   *
>   * \memberof wl_event_queue
>   */
> @@ -1643,17 +1593,24 @@ wl_display_dispatch_queue_pending(struct wl_display *display,
>   * \param display The display context object
>   * \return The number of dispatched events on success or -1 on failure
>   *
> - * Dispatch the display's default event queue.
> + * Dispatch events on the default event queue.
>   *
>   * If the default event queue is empty, this function blocks until there are
>   * events to be read from the display fd. Events are read and queued on
>   * the appropriate event queues. Finally, events on the default event queue
> - * are dispatched.
> - *
> - * In multi-threaded environment, programmer may want to use
> - * wl_display_read_events(). However, use of wl_display_read_events()
> - * must not be mixed with wl_display_dispatch(). See wl_display_read_events()
> - * and wl_display_prepare_read() for more details.
> + * are dispatched. On failure -1 is returned and errno set appropriately.
> + *
> + * In a multi threaded enviroment, do not manually wait using poll() (or
> + * equivalent) before calling this function, as doing so might cause a dead
> + * lock. If external reliance on poll() (or equivalent) is required, see
> + * wl_display_prepare_read_queue() of how to do so. \n\n

Another \n\n, is it a thing?

> + *
> + * This function is thread safe as long as it dispatches the right queue on the
> + * right thread. It is also compatible with the multi thread event reading
> + * preparation API (see wl_display_prepare_read_queue()), and uses the
> + * equivalent functionality internally. It is not allowed to call this function
> + * while the thread is being prepared for reading events, and doing so will
> + * cause a dead lock.
>   *
>   * \note It is not possible to check if there are events on the queue
>   * or not. For dispatching default queue events without blocking, see \ref

Ok, yes, this patch looks good to me except the few nitpicks above.

I read through the implementation of wl_display_dispatch_queue() and
compared it to the corresponding parts of the prepare-read API, and I
think the new docs are true. At least I couldn't think of any case
where things would fail.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20151008/5479cf63/attachment.sig>