[Mesa-dev] [Xcb] [PATCH] fix deadlock with libX11

Uli Schlachter psychon at znc.in
Wed May 8 14:30:48 PDT 2013


Hi,

why is mesa-dev CC'd to this mail? I guess that mesa run into this problem and
there is a thread on mesa-dev which could be provided in the commit message as
an extra reference? (This also means that this mail will end up in mesa-dev's
moderation queue...)

On 08.05.2013 14:58, Christian König wrote:
> libX11 acquires the display lock before calling "xcb_take_socket" and inside
> the "return_socket" callback, this can lead to a classic deadlock situation
> with the XCB iolock. [...]

I fail to see the classic deadlock situation here. In one place, mutex A is
acquired before mutex B and another code path does it the other way around, right?

However, where is this? XCB releases its iolock before calling the callback and
no there are no other callbacks, so it is impossible to first acquire the iolock
and the Display*'s lock, thus it should be impossible to deadlock.

Could you add the relevant backtraces and some more information about the
deadlock to the commit message?

Ah, after writing the rest of this mail, I just had this idea:

- X11 has taken the socket
- Thread A has locked the display.
- Thread B does xcb_no_operation() and thus ends up in libX11's return_socket(),
waiting for the display lock.
- Thread A calls e.g. xcb_no_operation(), too, ends up in return_socket() and
because socket_moving == 1, ends up waiting for thread B

=> Deadlock

Is this the issue you are seeing and which should be described better in the
commit message? What are the actual functions involved in this? How mad would
you be if I blamed libX11 for this? ;-)

> This patch removes the unnecessary extra signaling and just documents that
> "return_socket" might be called by multiple threads at the same time (which
> works fine with how libX11 is using the interface).

Well, this breaks the API, because not only libX11 uses xcb_take_socket(). It is
the reason that this function exists, but e.g. cairo's xcb backend also used
this function (before version 1.11.2; to simulate XLockDisplay()). I am not sure
if it would break.

> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>  src/xcb_out.c |   36 +++++++++++++++++++-----------------
>  src/xcbext.h  |    6 ++++--
>  src/xcbint.h  |    7 ++++---
>  3 files changed, 27 insertions(+), 22 deletions(-)
[...]
> diff --git a/src/xcbext.h b/src/xcbext.h
> index 98b3c93..f8520a0 100644
> --- a/src/xcbext.h
> +++ b/src/xcbext.h
> @@ -66,8 +66,10 @@ unsigned int xcb_send_request(xcb_connection_t *c, int flags, struct iovec *vect
>   * callback which XCB can call when it wants the write side of the
>   * socket back to make a request. This callback synchronizes with the
>   * external socket owner and flushes any output queues if appropriate.
> - * If you are sending requests which won't cause a reply, please note the
> - * comment for xcb_writev which explains some sequence number wrap issues.
> + * It is called without any locks held and so might be called from different
> + * threads at the same time. If you are sending requests which won't
> + * cause a reply, please note the comment for xcb_writev which explains
> + * some sequence number wrap issues.
>   * */

Instead of "It", "The callback function is called without any locks held".
However, that was true before, too. Also, this is no reason for "might be called
by different threads at the same time", because previously XCB did this, too.

>  int xcb_take_socket(xcb_connection_t *c, void (*return_socket)(void *closure), void *closure, int flags, uint64_t *sent);
>  
> diff --git a/src/xcbint.h b/src/xcbint.h
> index f9e5a52..c68b2fa 100644
> --- a/src/xcbint.h
> +++ b/src/xcbint.h
> @@ -79,14 +79,15 @@ void *_xcb_map_remove(_xcb_map *q, unsigned int key);
>  
>  /* xcb_out.c */
>  
> +typedef void (*xcb_return_socket_func_t)(void *closure);

Uhm, why?

>  typedef struct _xcb_out {
>      pthread_cond_t cond;
>      int writing;
>  
> -    pthread_cond_t socket_cond;
> -    void (*return_socket)(void *closure);
> +    xcb_return_socket_func_t return_socket;
>      void *socket_closure;
> -    int socket_moving;
> +    int socket_seq;

unsigned int please, overflow of signed integers is unspecified.

>      char queue[XCB_QUEUE_BUFFER_SIZE];
>      int queue_len;

Cheers,
Uli
-- 
Q: Because it reverses the logical flow of conversation.
A: Why is putting a reply at the top of the message frowned upon?


More information about the mesa-dev mailing list