[Xcb] [PATCH] fix deadlock with xcb_take_socket/return_socket v2

Jamey Sharp jamey at minilop.net
Wed May 15 15:05:20 PDT 2013


Hi Christian! I'm still trying to understand your concern. Thread-safety
is complicated and I'm totally willing to believe I'm reasoning about it
wrong, but I still can't see how.

On Wed, May 15, 2013 at 10:47:38AM +0200, Christian König wrote:
> Am 14.05.2013 21:45, schrieb Jamey Sharp:
> >libX11 carefully drops the Display lock around the xcb_generate_id calls
> >that could result in XCB trying to write, and as far as I can tell it
> >doesn't ask XCB to send requests any other time.

You have two responses to this point, which I'll respond to separately:

> Additional to that even libX11 calls xcb_take_socket with the
> displaylock held (which makes perfectly sense, cause otherwise
> another thread might directly steal the socket again), so in this
> code path you have displaylock->iolock, while in the return code
> path you have iolock->displaylock. The only correct solution I can
> see to this is to drop the iolock while calling the return code
> path.

libX11 doesn't call xcb_take_socket if it already owns the socket, as
signaled by (dpy->bufmax != dpy->buffer). If this Display does not own
the socket, then calling xcb_take_socket is safe, because that can't
possibly call return_socket on this Display. So there is no potential
for deadlock there.

Did I miss something?

> Yeah but nobody prevents other XCB users (like the OpenGL and/or
> VDPAU implementation in mesa) to be called with the display-lock
> held.

Nothing prevents it except the fact that the callers won't work if they
try it. If you want to call XCB-based API, make sure Xlib is not holding
the internal Display lock on the current thread.

That's the rule for xcb_take_socket. If you want a different rule, we
should have a different discussion. :-)

Note that although there are lots of mentions of GL and VDPAU in #20708,
none of the test programs call anything outside core libX11. I don't see
how #20708 is about whatever bug you're seeing.

> Bugreport https://bugs.freedesktop.org/show_bug.cgi?id=20708 and
> bugreport https://bugs.freedesktop.org/show_bug.cgi?id=30450 are
> about two completely different bugs!!! So setting bug 20708 as a
> duplicate of bug 30450 was actually incorrect.
> 
> The bug originally described in 20708 (and also what the original
> test program clearly exercise) is a deadlock because of acquiring
> locking primitives in different orders.

I've just re-read the comments and all three test programs in #20708,
and I don't see how any of them are locking-order deadlocks. You say
this is "clear"; could you demonstrate how, please?

The second test program demonstrated that we'd broken Xlib's crazy
nested-lock implementation. Uli and I both wrote equivalent patches to
fix that, and I closed the bug.

Reinhard re-opened with a different bug, which most likely was being
masked by the earlier bug. Although that bug's stack traces aren't
clearly a result of #30450, it still seems like the most plausible
candidate. Please help me understand why that's not true.

> Currently my testcase only triggers the deadlock fixed with this
> patch (pretty complicated stuff involving XBMC, VDPAU and mesas
> OpenGL implementation). But I'm pretty sure that once I get this one
> upstream we are going to run into the other one also.

Do you have a minimal test case you can share?

Thanks for taking the time to investigate this,
Jamey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.freedesktop.org/archives/xcb/attachments/20130515/7ad43ce6/attachment.pgp>


More information about the Xcb mailing list