additional race in the wayland protocols?

Sun Mar 24 21:26:57 UTC 2019

Hi all,

There is race condition which produces the error:

not a valid new object id (4278190081), message data_offer(n)

use-case:
1. mutli-threaded client: several threads dispatching and flushing the
wayland connection
2. server-side objects are used in the protocol (e.g.
linux-dmabuf-unstable-v1.xml used for buffers)
3. server-side objects are destroyed and created quite frequently (not
mandatory for the error but will increase the reproducibility).

To understand why the error occurs we just need to take a look to the
generated code of the buffer destroy event:

static inline void
wl_buffer_destroy(struct wl_buffer *wl_buffer)
{
        wl_proxy_marshal((struct wl_proxy *) wl_buffer,
                WL_BUFFER_DESTROY);
        wl_proxy_destroy((struct wl_proxy *) wl_buffer);
}

if the thread(T1) executing the wl_buffer_destroy call will be
interrupted due to the scheduling between the wl_proxy_marshal call
and wl_proxy_destroy calls and another thread(T2) will dispatch the
connect, the buffer destroy event will be send to the compositor and
if also request to create a new buffer will be in the queue
server could just reuse the destroyed object id wayland ,
Then dispatching of the event with new object id on the client site
needs to happen before T1 had change to run. in this case we will see
that the slot in the server map object on the cleint side is still not
NULL because wl_proxy_destroy call is still not executed.

This race is quite rare but we faced this in the real project.

To fix this we could create the proxy wrapper first from the object to
be destroyed change the order of the wl_proxy_destroy and
wl_proxy_marshal calls where wl_proxy_marshal will use the wrapper
object and then destroy the wrapper object.

What do you think?