[Xcb] deadlock with xlib/xcb
Christoph Pfister
christophpfister at gmail.com
Thu Aug 9 13:53:50 PDT 2007
Hi,
The following hang was discovered by Darren Salt:
Thread 7 (process 7297):
#0 0x00002ad88209f756 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00002ad8847b699e in _xcb_conn_wait (c=0xf195f0, cond=0x43805df4,
vector=0x0, count=0xffffffffffffffff) at xcb_conn.c:296
#2 0x00002ad8847b8405 in xcb_wait_for_reply (c=0xf195f0, request=623,
e=0x43805e88) at xcb_in.c:344
#3 0x00002ad881540e7b in _XReply (dpy=0xf0b600, rep=0x43805ed0, extra=0,
discard=1) at ../../src/xcb_io.c:364
#4 0x00002ad8815358da in XSync (dpy=0xf0b600, discard=0)
at ../../src/Sync.c:48
<snip>
Thread 16 (process 7285):
#0 0x00002ad88209f756 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00002ad8847b684b in _xcb_lock_io (c=0xf195f0) at xcb_conn.c:279
#2 0x00002ad8847b69ac in _xcb_conn_wait (c=0xf195f0,
cond=<value optimized out>, vector=0x0, count=0x0) at xcb_conn.c:320
#3 0x00002ad8847b8405 in xcb_wait_for_reply (c=0xf195f0, request=621,
e=0x7fff2ac5b638) at xcb_in.c:344
#4 0x00002ad881540e7b in _XReply (dpy=0xf0b600, rep=0x7fff2ac5b680, extra=0,
discard=1) at ../../src/xcb_io.c:364
#5 0x00002ad881536e84 in XTranslateCoordinates (dpy=0xf0b600,
src_win=39845891, dest_win=77, src_x=0, src_y=0, dst_x=0x7fff2ac5b854,
dst_y=0x7fff2ac5b850, child=0x7fff2ac5b848) at ../../src/TrCoords.c:53
<snip>
Concretly the situation looks like this:
288 int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond,
struct iovec **vector, int *count)
289 {
290 int ret;
291 fd_set rfds, wfds;
292
293 /* If the thing I should be doing is already being done, wait for it. */
294 if(count ? c->out.writing : c->in.reading)
295 {
296 pthread_cond_wait(cond, &c->iolock); // <--- Thread 16
297 return 1;
298 }
299
300 FD_ZERO(&rfds);
301 FD_SET(c->fd, &rfds);
302 ++c->in.reading;
303
304 FD_ZERO(&wfds);
305 if(count)
306 {
307 FD_SET(c->fd, &wfds);
308 ++c->out.writing;
309 }
310
311 _xcb_unlock_io(c);
312 do {
313 ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
314 } while (ret == -1 && errno == EINTR);
315 if (ret < 0)
316 {
317 _xcb_conn_shutdown(c);
318 ret = 0;
319 }
320 _xcb_lock_io(c); // <--- Thread 7
What happens: Thread 7 is running normally (c->xlib.lock == 0) and
waits a bit at line 313. Meanwhile thread 16 is scheduled
(c->xlib.lock == 1) and waits at line 296 for thread 7 to complete its
operation. When thread 7 reaches line 320 it can't take the lock
because c->xlib.lock == 1 and c->xlib.thread != pthread_self() ...
272 void _xcb_lock_io(xcb_connection_t *c)
273 {
274 pthread_mutex_lock(&c->iolock);
275 while(c->xlib.lock)
276 {
277 if(pthread_equal(c->xlib.thread, pthread_self()))
278 break;
279 pthread_cond_wait(&c->xlib.cond, &c->iolock);
280 }
281 }
So the next question was why this can happen at all. Let's take a look
at _XReply:
<snip>
355 /* Internals of UnlockDisplay done by hand here, so that we can
356 insert_pending_request *after* we _XPutXCBBuffer, but before we
357 unlock the display. */
358 _XPutXCBBuffer(dpy);
359 current = insert_pending_request(dpy);
360 if(!dpy->lock || dpy->lock->locking_level == 0)
361 xcb_xlib_unlock(dpy->xcb->connection); // <--- XXX
362 if(dpy->xcb->lock_fns.unlock_display)
363 dpy->xcb->lock_fns.unlock_display(dpy);
364 reply = xcb_wait_for_reply(c, current->sequence, &error);
365 LockDisplay(dpy);
Line 361 had to be executed in thread 7 (impossible to check it, but
seems to be the only explanation), so c->xlib.lock became 0 before
xcb_wait_for_reply was called. However Thread 16 had
dpy->lock->locking_level == 1 (this time verified with gdb and a
coredump) so the "lock" wasn't released and caused a part of the
trouble.
I have no idea were the actual bug is, but I see something like three
possible conditions which would avoid this:
- xlib.lock has to be released before calling xcb_wait_for_reply
- xlib.lock must not be released before calling xcb_wait_for_reply
- xcb has to deal with that situation internally
Hopefully you can follow my thoughts and have some nice ideas to fix this =)
Christoph
PS: Hints to reproduce the issue (note that I didn't try personally):
libx11-6 1.1.3-1, libxcb* 1.0-3 (Debian)
gxine dev, xine-lib 1.2 dev; gxine built --without-xcb
vdr 1.4.5, vdr-xine 0.7.9 dev (local builds)
Command: ./src/gxine vdr://tmp/vdr-xine/stream#demux:mpeg_pes
vdr tuned to BBC News 24 (which is 16:9)
http://zap.tartarus.org/~ds/gxine-0.5.900-dev.tar.bz2
http://zap.tartarus.org/~ds/xine-lib-1.1.90hg.tar.bz2
More information about the Xcb
mailing list