Wayland client library thread safety

Fri Apr 20 13:06:19 PDT 2012

On Thu, Apr 19, 2012 at 03:38:39PM +0200, Arnaud Vrac wrote:
> Hello everyone,
> 
> I am hitting a bug when using Qt5 and wayland on an embedded platform,
> for which I have written a custom EGL backend. The problem is that Qt5
> (QtQuick2 actually) renders in a separate thread, when the wayland
> display has been created in the main thread. I know there is a patch
> for "thread affinity" in qtwayland to solve this, but it is wrong and
> breaks other clients.
> 
> Since it is required from the EGL spec to be able to render in
> multiple threads, how can this be solved ? Does the wayland backend
> for mesa support this properly ? I don't see how it can work since
> even writes in the connection queue are not protected.

Hi Arnaud,

It's a problem that's been around for a while and we're aware of it.
I've been working with Jørgen Lind from Nokias Qt team on it and we're
looking at two different approaches for solving this.

Initial assumption for wl_display and related objects was that it is
not going to be thread safe, and you have to lock access to the wl API
yourself if you're going to use it from multiple threads.  It's a
valid assumption and it avoids fine grained API level locking, and in
many cases, wl would be integrated into an existing mainloop that
solves locking.  Of course, EGL makes that kind of design impossible.

So the two options we're looking at are

1) Toughen up and add locking around all access to wl_display.  It's
   not as much code as it sounds, there are actually only a few entry
   points that need locking.  Another problem that we need to solve is
   how the wl_display_iterate() inside EGL may dispatch events, from a
   rendering thread and (in case of the mesa implementation) with the
   EGL lock held.  The solution there is to add thread affinity to
   objects such that event callbacks from an object are always handled
   in the thread that owns the object.

   In case of single-threaded EGL applications we also need to make
   sure we don't call out to random application callbacks while
   blocking inside eglSwapBuffer (or other blocking EGL calls).  The
   idea here is to introduce a wl_display_iterate_for_object() entry
   point that will only call out for events addressed to that object
   and queue up the rest.  This makes eventloop integration more
   complicated, since now the question of "does this wl_display have
   events?" isn't just a matter or whether or not the fd polls
   readable, there may also be events queued up in the wl_display.
   That's a part of Xlib API I've tried hard to avoid inflicting on
   wayland users, but thanks to Xlib working this way, it's something
   most event loops support.  We could also do some trickery with
   epoll and eventfd to the "events pending" iff "fd readable"
   semantics again.

   Jørgen has been working on this approach, his patch is attached
   (still work in progress).

2) Add a new wl_display for the other thread.  This gives EGL its own
   wl_display, complete with a separate socket to the server.  This is
   a more radical idea perhaps, but much less invasive.  We don't have
   to change the assumption that wl_display is only usable in one
   thread and we don't have to add locking.  We do need a mechanism to
   share an object between the threads (typically the wl_surface), but
   that's doable.  The bigger problem is how to synchronize the two
   command streams.  Right now we can say eglSwapBuffer and then know
   that that attaches a new wl_buffer to the surface and do something
   that relies on that having happend, but if it happens in different
   connections we need a mechanism to synchronize the two protocol
   streams.  Of course, if you want to do something like that, you
   still need to synchronize between the two threads to make sure the
   eglSwapBuffer happened in one thread before you do something that
   depends on that in anoteher thread.

   The biggest drawback with this approach is that even single
   threaded EGL applications now need to connections to the server.  

Anyway, that sums up the work we've done in this area.  I'd say that
option 1) is the preferred solution at the moment.

Kristian

> Here is a crude extract from my EGL backends where wayland can fail
> because of thread concurrency:
> 
> - SwapBuffers:
> 
> while (!swap_done) {
>   wl_display_flush()
>   // wait frame callback
>   wl_display_iterate(WL_DISPLAY_READABLE)
> }
> 
> wl_surface_frame()
> wl_surface_attach()
> wl_surface_damage()
> lock_back_buffer()
> swap_buffers()
> 
> - GetBuffers (called on makeCurrent or at the first gl call after swap)
> 
> while (no_buffer_unlocked)
>   wl_display_flush()
>   // wait for a buffer to be released
>   wl_display_iterate(WL_DISPLAY_READABLE)
> }
> 
> Thanks for your help,
> 
> -- 
> rawoul
> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/wayland-devel