Wayland client library thread safety

Tue Apr 24 09:13:38 PDT 2012

On Fri, Apr 20, 2012 at 10:06 PM, Kristian Hoegsberg
<hoegsberg at gmail.com> wrote:
> On Thu, Apr 19, 2012 at 03:38:39PM +0200, Arnaud Vrac wrote:
>> Hello everyone,
>>
>> I am hitting a bug when using Qt5 and wayland on an embedded platform,
>> for which I have written a custom EGL backend. The problem is that Qt5
>> (QtQuick2 actually) renders in a separate thread, when the wayland
>> display has been created in the main thread. I know there is a patch
>> for "thread affinity" in qtwayland to solve this, but it is wrong and
>> breaks other clients.
>>
>> Since it is required from the EGL spec to be able to render in
>> multiple threads, how can this be solved ? Does the wayland backend
>> for mesa support this properly ? I don't see how it can work since
>> even writes in the connection queue are not protected.
>
> Hi Arnaud,
>
> It's a problem that's been around for a while and we're aware of it.
> I've been working with Jørgen Lind from Nokias Qt team on it and we're
> looking at two different approaches for solving this.
>
> Initial assumption for wl_display and related objects was that it is
> not going to be thread safe, and you have to lock access to the wl API
> yourself if you're going to use it from multiple threads.  It's a
> valid assumption and it avoids fine grained API level locking, and in
> many cases, wl would be integrated into an existing mainloop that
> solves locking.  Of course, EGL makes that kind of design impossible.
>
> So the two options we're looking at are
>
> 1) Toughen up and add locking around all access to wl_display.  It's
>   not as much code as it sounds, there are actually only a few entry
>   points that need locking.  Another problem that we need to solve is
>   how the wl_display_iterate() inside EGL may dispatch events, from a
>   rendering thread and (in case of the mesa implementation) with the
>   EGL lock held.  The solution there is to add thread affinity to
>   objects such that event callbacks from an object are always handled
>   in the thread that owns the object.
>
>   In case of single-threaded EGL applications we also need to make
>   sure we don't call out to random application callbacks while
>   blocking inside eglSwapBuffer (or other blocking EGL calls).  The
>   idea here is to introduce a wl_display_iterate_for_object() entry
>   point that will only call out for events addressed to that object
>   and queue up the rest.  This makes eventloop integration more
>   complicated, since now the question of "does this wl_display have
>   events?" isn't just a matter or whether or not the fd polls
>   readable, there may also be events queued up in the wl_display.
>   That's a part of Xlib API I've tried hard to avoid inflicting on
>   wayland users, but thanks to Xlib working this way, it's something
>   most event loops support.  We could also do some trickery with
>   epoll and eventfd to the "events pending" iff "fd readable"
>   semantics again.
>
>   Jørgen has been working on this approach, his patch is attached
>   (still work in progress).
>
> 2) Add a new wl_display for the other thread.  This gives EGL its own
>   wl_display, complete with a separate socket to the server.  This is
>   a more radical idea perhaps, but much less invasive.  We don't have
>   to change the assumption that wl_display is only usable in one
>   thread and we don't have to add locking.  We do need a mechanism to
>   share an object between the threads (typically the wl_surface), but
>   that's doable.  The bigger problem is how to synchronize the two
>   command streams.  Right now we can say eglSwapBuffer and then know
>   that that attaches a new wl_buffer to the surface and do something
>   that relies on that having happend, but if it happens in different
>   connections we need a mechanism to synchronize the two protocol
>   streams.  Of course, if you want to do something like that, you
>   still need to synchronize between the two threads to make sure the
>   eglSwapBuffer happened in one thread before you do something that
>   depends on that in anoteher thread.
>
>   The biggest drawback with this approach is that even single
>   threaded EGL applications now need to connections to the server.
>
> Anyway, that sums up the work we've done in this area.  I'd say that
> option 1) is the preferred solution at the moment.
>

Thanks for your answer Kristian.

Solution 1 should work, however it requires some changes to the
wayland client API to be able to do asynchrous writes (the epoll fd is
only pollable for reading). The proposed patch also adds a lot of
system calls to send or receive messages.

Solution 2 would not work for EGL as you describe it, since you should
be able to render in multiple threads with the same EGL display. There
would also be a wl_client on the server for each thread, which is not
wanted.

Maybe we could find a solution by mixing solution 1 and 2:

Still keep a single wl_display with proper locking, and add a
wl_connection for each thread. For the wl_display thread, the
connection would still be made through the unix socket, while for the
other thread you would prepare a socketpair on the client side and
send the peer fd through the main connection. This way you only need
one fd for each thread which can be polled for read and write, like
for the main thread. This also means that objects are implicitly
shared between threads, like in solution 1.

We can aggregate all connections on the server in a single client.

There is a synchronisation problem, at least we need to make sure
objects are created on the server before they are used from another
thread, but that can be solved.

I'm also not sure it would be easy to move objects from one threads to
another, since some events might be queued in the old thread.

-- 
rawoul