[Xcb] deadlock with xlib/xcb

Christoph Pfister christophpfister at gmail.com
Sun Oct 28 06:31:45 PDT 2007


Hi Jamey,

2007/10/27, Jamey Sharp <jamey at minilop.net>:
> Thanks for reminding me about this thread, Christoph. You're right, I've
> ignored it much too long.

Nice to see that you're working on it :)

> Turns out I can trivially get a similar-looking hang and backtrace with
> `ico -threads 2`, by waiting about ten seconds. That should make
> debugging and testing easier. I'd noticed quite a while ago that
> multithreaded ico hangs, but it's a dodgy enough use of Xlib to begin
> with that I wrote it off as "probably not my fault". Your report has
> made me rethink that.

Right, ico seems to be a very good test case (I'm also able to
reproduce it over here).

> On 8/9/07, Christoph Pfister <christophpfister at gmail.com> wrote:
> > However Thread 16 had dpy->lock->locking_level == 1 ...
>
> I believe that means that xine's use of XLockDisplay played a role in
> this case?
>
> However, the ico case has locking_level == 0 when the hang occurs, so I
> don't think this is relevant.
>
> > I have no idea were the actual bug is, but I see something like three
> > possible conditions which would avoid this:
> > - xlib.lock has to be released before calling xcb_wait_for_reply
> > - xlib.lock must not be released before calling xcb_wait_for_reply
> > - xcb has to deal with that situation internally
>
> I tested option 2 (don't release the xcb-xlib lock around
> xcb_wait_for_reply) because I'm not sure why I was dropping that lock
> there in the first place. (I think it may have been a performance
> optimization for multi-threaded Xlib-using applications.) Unfortunately,
> the ico hang still occurs.

Lemme summarize the issue a bit more precisely (I found out more stuff
because of the test case):

- one thread calls a xcb function (xlib lock not held because of two
possible reasons*)
- some stuff happens inside xcb
- the thread (shortly) unlocks the display lock
- another thread calls an xcb function (xlib lock held)
- this thread locks the display lock and waits for the first thread
- the first one can't succeed because the second one is waiting:

272 void _xcb_lock_io(xcb_connection_t *c)
273 {
274     pthread_mutex_lock(&c->iolock);
275     while(c->xlib.lock) // <-- this condition is always true
276     {
277         if(pthread_equal(c->xlib.thread, pthread_self())) // <--
this condition is always false
278             break;
279         pthread_cond_wait(&c->xlib.cond, &c->iolock);
280     }
281 }

* about the reasons: Either because the app uses xcb directly or
because of some xlib functions which release the lock before the call.

So in this case option 2 _improved_ the situation (I had longer to
reproduce it), but it doesn't solve it at all.
Why? Because there are other functions behaving similarly (not holding
the lock during the invocation), for example "wait_or_poll_for_event".
But more importantly people mixing xcb and xlib calls will be hit by
this issue and you can't solve that with option 1 or 2.

So there are two possibilities left:

- Don't use that xlib lock at all and find a way to replace
xcb_get_request_sent (because that's the only sense of the lock as far
as I understand).
- (option 3): Some magic to avoid that deadlock - but imho the result
of this could be quite hackish.

The first possiblity also fixes the "java" bug (imho not really java's
bug: it's a regression that extensions compiled without XTHREADS set
don't work anymore).

> I'm out of time for today, unfortunately, but I wanted to at least share
> what I've discovered so far.

Thanks; hopefully this helps you further (and I'm on irc if you want
to discuss it directly).

> Jamey

Christoph


More information about the Xcb mailing list