[Mesa-dev] [Xcb] [PATCH] fix deadlock with libX11

Thu May 9 05:58:43 PDT 2013

Hi Uli,

> why is mesa-dev CC'd to this mail? I guess that mesa run into this problem and
> there is a thread on mesa-dev which could be provided in the commit message as
> an extra reference? (This also means that this mail will end up in mesa-dev's
> moderation queue...)

I was working on the vdpau mesa state tracker when I discovered this 
bug, and thought it might be a good idea if anybody is having similar 
issues that they take notice of this mail.

> Ah, after writing the rest of this mail, I just had this idea:
>
> - X11 has taken the socket
> - Thread A has locked the display.
> - Thread B does xcb_no_operation() and thus ends up in libX11's return_socket(),
> waiting for the display lock.
> - Thread A calls e.g. xcb_no_operation(), too, ends up in return_socket() and
> because socket_moving == 1, ends up waiting for thread B
>
> => Deadlock
>
> Is this the issue you are seeing and which should be described better in the
> commit message? What are the actual functions involved in this? How mad would
> you be if I blamed libX11 for this? ;-)

Yes that's a good example of what I'm trying to fix. More general 
speaking it seems to me like a design flaw in the xcb_take_socket interface.

To prevent different threads from stealing the socket from each other 
the caller of "xcb_take_socket" must hold a lock that is also acquired 
in "return_socket". With this trying to prevent calling "return_socket" 
from different threads like the current implementation does will always 
result in some kind of deadlock situation. So the problem is not really 
libX11 specific.

> Attached is a small test program to test this theory and this does indeed cause
> a deadlock. However, I still wouldn't call this situation "classic", because
> this involves waiting on a condition variable.
>
> Is this the problem you are trying to fix?

Yes indeed looks like it, but my use case was a bit more complicated. 
Well if it's a conditional or a mutex doesn't really matter, cause it 
clearly the same effect. We acquire synchronizing primitives in 
different order, and that's what I call a classic deadlock problem.

> It would be nice if the commit somehow referenced this so that, if this patch
> ever causes any problems, we can test new patches against it.

Well I can refer to this mail in the V2 of the patch.

On the other hand we have quite a bunch of bugreports about such 
problems, I just didn't refer to them cause this fix is only part of the 
solution.

For example for this one 
https://bugs.freedesktop.org/show_bug.cgi?id=20708, we need to fix this 
bug and the one described in 
https://bugs.freedesktop.org/show_bug.cgi?id=30450.

> [...]
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> [...]
>
> I noticed that the patch is from a different mail address than your mail. Is
> this intentional? Why?

That's just how I usually work, sending patches from a different address 
keeps the amount of spam low on the work address.

Christian.