race between (say) wl_display_sync and wl_callback_add_listener?

Tue Sep 26 10:43:46 UTC 2023

On Tue, Sep 26, 2023 at 10:08:55AM +0100, John Cox wrote:
> Hi
> 
> Many thanks for your comprehensive answer
> 
> >On Mon, Sep 25, 2023 at 05:10:30PM +0100, John Cox wrote:
> >> Hi
> >
> >Hi,
> >
> >> Assuming I have a separate poll/read_events/dispatch_queue thread to my
> >> main thread. If I go wl_display_sync/wl_callback_add_listener on the
> >> main thread is there a race condition between those two calls where the
> >> other thread can read & dispatch the callback before the listener is
> >> added or is there some magic s.t. adding the listener will always work
> >> as expected?
> >
> >This is indeed a racy interaction, in more ways than one:
> >
> >1. There is an inherent data race setting and using the
> >   listener/user_data concurrently from different threads.
> >   Neither adding a listener/user_data (see wl_proxy_add_listener()),
> >   nor the actual dispatching operation in libwayland (which uses the
> >   listener, see wayland-client.c:dispatch_event()) is protected by
> >   the internal "display lock".
> 
> So are all interactions from another thread than the dispatch thread
> unsafe e.g. buffer creation, setup & assignment to a surface or is it
> just the obviously racy subset?

>From a low-level libwayland API perspective, creating objects or making
requests from another thread is generally fine. It's this particular
case of requests that create object and cause events to be emitted
immediately for them that's problematic.

Many requests that fall in this category are in fact global binds (e.g.,
wl_shm, wl_seat, wl_output), which are typically handled in the context of the
wl_registry.global handler, thus in the dispatch thread, in which case no
special consideration is required.

There are also some requests that at first glance seem problematic,
but in fact they are not (if things are done properly, and assuming
correct compositor behavior). For example:

  struct wl_callback *wl_surface_frame(struct wl_surface *surface);

seems suspiciously similar to wl_display_sync(). However, this request
takes effect (i.e., the callback will receive events) only after the next
wl_surface_commit(), so the following is safe:

cb = wl_surface_frame(surface);
/* As long as the listener is set before the commit we are good. */
wl_callback_add_listener(cb);
wl_surface_commit(surface);

Of course, on top of all these there are also the typical higher-level
multithreading synchronization considerations.

<snip>

> >Finally, you might also want to consider a different design more in line
> >with the libwayland API expectations, if possible.
> 
> Not really possible - there are some things I need done (buffer reclaim
> mostly) async as soon as possible and I don't have control over the main
> loop.

Buffer reclaiming (if by that you mean handling wl_buffer.release and
making the buffer available for drawing again) is a good case for a
separate queue, since wl_buffers are "independent" objects, as long as you
can dispatch on demand: when you need a new buffer and don't have any you
can dispatch the queue to check if the compositor has released one.

> There may be a document that sets out everything you've said above and
> gives the exact expectations but I failed to find it. In general the
> individual call documentation is great but how interactions between
> calls are managed is harder to find. I started from an (incorrect)
> assumption that everything was fully async and could be called from any
> thread (my natural progamming style) and so there must be magic to
> enable that and have only slowly been corrected by reality.

I wouldn't mind an official Wayland API/technical FAQ myself :)

Thanks,
Alexandros