[Mesa-dev] [Xcb] [PATCH] fix deadlock with libX11
Christian König
deathsimple at vodafone.de
Thu May 9 05:58:43 PDT 2013
Hi Uli,
> why is mesa-dev CC'd to this mail? I guess that mesa run into this problem and
> there is a thread on mesa-dev which could be provided in the commit message as
> an extra reference? (This also means that this mail will end up in mesa-dev's
> moderation queue...)
I was working on the vdpau mesa state tracker when I discovered this
bug, and thought it might be a good idea if anybody is having similar
issues that they take notice of this mail.
> Ah, after writing the rest of this mail, I just had this idea:
>
> - X11 has taken the socket
> - Thread A has locked the display.
> - Thread B does xcb_no_operation() and thus ends up in libX11's return_socket(),
> waiting for the display lock.
> - Thread A calls e.g. xcb_no_operation(), too, ends up in return_socket() and
> because socket_moving == 1, ends up waiting for thread B
>
> => Deadlock
>
> Is this the issue you are seeing and which should be described better in the
> commit message? What are the actual functions involved in this? How mad would
> you be if I blamed libX11 for this? ;-)
Yes that's a good example of what I'm trying to fix. More general
speaking it seems to me like a design flaw in the xcb_take_socket interface.
To prevent different threads from stealing the socket from each other
the caller of "xcb_take_socket" must hold a lock that is also acquired
in "return_socket". With this trying to prevent calling "return_socket"
from different threads like the current implementation does will always
result in some kind of deadlock situation. So the problem is not really
libX11 specific.
> Attached is a small test program to test this theory and this does indeed cause
> a deadlock. However, I still wouldn't call this situation "classic", because
> this involves waiting on a condition variable.
>
> Is this the problem you are trying to fix?
Yes indeed looks like it, but my use case was a bit more complicated.
Well if it's a conditional or a mutex doesn't really matter, cause it
clearly the same effect. We acquire synchronizing primitives in
different order, and that's what I call a classic deadlock problem.
> It would be nice if the commit somehow referenced this so that, if this patch
> ever causes any problems, we can test new patches against it.
Well I can refer to this mail in the V2 of the patch.
On the other hand we have quite a bunch of bugreports about such
problems, I just didn't refer to them cause this fix is only part of the
solution.
For example for this one
https://bugs.freedesktop.org/show_bug.cgi?id=20708, we need to fix this
bug and the one described in
https://bugs.freedesktop.org/show_bug.cgi?id=30450.
> [...]
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> [...]
>
> I noticed that the patch is from a different mail address than your mail. Is
> this intentional? Why?
That's just how I usually work, sending patches from a different address
keeps the amount of spam low on the work address.
Christian.
More information about the mesa-dev
mailing list