[Xcb] [PATCH v2 2/2] Also signal special event waiters in _xcb_in_wake_up_next_reader

Thu Jun 11 09:50:02 PDT 2015

Hi,

Am 11.06.2015 um 07:54 schrieb Michel Dänzer:
[...]
>>> As described in the commit log, the test case for this patch is a
>>> different one: The Mesa demos program called glthreads, which creates
>>> several threads rendering with OpenGL (to one separate window each)
>>> concurrently.
>>>
>>> The problem fixed by this patch is that glthreads often hangs when
>>> trying to quit by pressing the Escape key.
>>
>> Is there any easy way to get glthreads on debian that doesn't involve building
>> something "big" like all of mesa?
> 
> git clone git://anongit.freedesktop.org/mesa/demos
> cd demos
> ./autogen.sh
> make -C src/xdemos glthreads

Thanks, that was indeed relatively easy (and only required 7MiB of additional
packages).

Sadly, I cannot reproduce this hang, but I can easily reproduce the Xlib problem
of "only progresses when an event is received" / "XNextEvent() in one threads
blocks other threads from receiving a reply". I made it easier to work around
this locally by adding PointerMotionMask to the window's event_mask. That allows
me to just move the mouse pointer around to have the window update. :-) Not once
did it hang after pressing escape at random times. The only problem I saw was
with arguments "-t" once when changing textures:

(gdb) run -t
Starting program: /tmp/demos/src/xdemos/glthreads -t
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
glthreads: No explict locking.
glthreads: Single display connection.
XInitThreads() returned 1 (success)
glthreads: creating windows
glthreads: creating threads
[New Thread 0x7fffefa3c700 (LWP 371)]
glthreads: Created thread 0x7fffefa3c700
[New Thread 0x7fffef23b700 (LWP 372)]
glthreads: Created thread 0x7fffef23b700
glthreads: 0: GL_RENDERER = Mesa DRI Mobile Intel® GM45 Express Chipset
glthreads: 1: GL_RENDERER = Mesa DRI Mobile Intel® GM45 Express Chipset
glthreads: malloc.c:3987: _int_free: Assertion `p->fd_nextsize->bk_nextsize ==
p' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffef23b700 (LWP 372)]
0x00007ffff6865107 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  0x00007ffff6865107 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff68664e8 in __GI_abort () at abort.c:89
#2  0x00007ffff68a80fd in __malloc_assert
(assertion=assertion at entry=0x7ffff6996028 "p->fd_nextsize->bk_nextsize == p",
file=file at entry=0x7ffff6992005 "malloc.c",
    line=line at entry=3987, function=function at entry=0x7ffff69923a3
<__func__.11561> "_int_free") at malloc.c:293
#3  0x00007ffff68a9b1e in _int_free (av=0x7ffff6bd3620 <main_arena>,
p=<optimized out>, have_lock=0) at malloc.c:3987
#4  0x00007ffff29e7eee in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#5  0x00007ffff29cbc69 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#6  0x00007ffff29cc590 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#7  0x00007ffff2adac1b in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#8  0x00000000004025a7 in draw_loop (wt=0x6045f8 <WinThreads+56>) at glthreads.c:275
#9  thread_function (p=0x6045f8 <WinThreads+56>) at glthreads.c:526
#10 0x00007ffff6be10a4 in start_thread (arg=0x7fffef23b700) at pthread_create.c:309
#11 0x00007ffff691604d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I'll just ignore this.

>> Does this hang have the same backtrace as what the example program demonstrates
>> and that is fixed with patch 1: One thread stuck in _xcb_conn_wait() ->
>> pthread_cond_wait and no other threads inside of libxcb?
> 
> No, here's the backtrace of the hanging threads:
> 
> Thread 2 (Thread 0x7f3bfa91f700 (LWP 13505)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007f3c0058c219 in _xcb_conn_wait (c=0x24641d0, cond=<optimized out>, vector=0x0, count=0x0) at ../../src/xcb_conn.c:447
> #2  0x00007f3c0058dbd9 in xcb_wait_for_special_event (c=0x7f3bf400130c, c at entry=0x24641d0, se=0x80) at ../../src/xcb_in.c:761
> #3  0x00007f3c03458e3f in dri3_find_back (priv=0x7f3bf40008c0, c=0x24641d0) at ../../../src/glx/dri3_glx.c:1268
> #4  dri3_get_buffer (driDrawable=<optimized out>, loaderPrivate=0x7f3bf40008c0, buffer_type=dri3_buffer_back, format=4098) at ../../../src/glx/dri3_glx.c:1294
> #5  dri3_get_buffers (driDrawable=<optimized out>, format=4098, stamp=<optimized out>, loaderPrivate=0x7f3bf40008c0, buffer_mask=<optimized out>, buffers=0x7f3bfa91eb40) at ../../../src/glx/dri3_glx.c:1473
> #6  0x00007f3bfe127d60 in dri_image_drawable_get_buffers (statts_count=<optimized out>, statts=<optimized out>, images=<optimized out>, drawable=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri2.c:279
> #7  dri2_allocate_textures (ctx=0x24cbd60, drawable=0x7f3bf40009f0, statts=0x7f3bf4001048, statts_count=2) at ../../../../../src/gallium/state_trackers/dri/dri2.c:402
> #8  0x00007f3bfe123dbd in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x7f3bf4001048, count=2, out=0x7f3bfa91eca0) at ../../../../../src/gallium/state_trackers/dri/dri_drawable.c:83
> #9  0x00007f3bfe00dfde in st_framebuffer_validate (stfb=stfb at entry=0x7f3bf4000c00, st=st at entry=0x2527a10) at ../../../src/mesa/state_tracker/st_manager.c:200
> #10 0x00007f3bfe00fbc5 in st_manager_validate_framebuffers (st=st at entry=0x2527a10) at ../../../src/mesa/state_tracker/st_manager.c:863
> #11 0x00007f3bfdfc8e16 in st_validate_state (st=st at entry=0x2527a10) at ../../../src/mesa/state_tracker/st_atom.c:181
> #12 0x00007f3bfdfd2821 in st_Clear (ctx=0x7f3c03cd7010, mask=18) at ../../../src/mesa/state_tracker/st_cb_clear.c:463
> #13 0x00000000004025c7 in draw_loop (wt=0x604580 <WinThreads>) at glthreads.c:275
> #14 thread_function (p=0x604580 <WinThreads>) at glthreads.c:526
> #15 0x00007f3c029880a4 in start_thread (arg=0x7f3bfa91f700) at pthread_create.c:309
> #16 0x00007f3c026b604d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> Thread 1 (Thread 0x7f3c03d847c0 (LWP 13503)):
> #0  0x00007f3c029894db in pthread_join (threadid=139895583667968, thread_return=thread_return at entry=0x0) at pthread_join.c:92
> #1  0x0000000000401f4a in clean_up () at glthreads.c:541
> #2  main (argc=<optimized out>, argv=<optimized out>) at glthreads.c:686
[...]

Well, yeah. One thread sitting in _xcb_conn_wait() waiting to be woken up by
some other thread which "used to be" in _xcb_conn_wait() and is supposed to wake
up this thread once its done using the fd.

Thanks to this I finally figured out why we have this complicated wake up logic. :-)

Multiple threads are waiting for things "coming from the X11 server". Only one
thread can recv() from the socket at the same time (else we can't wake up other
threads if we just received what they are waiting for).

When a reply/error comes in, the thread waiting for this (if it exists) is woken
up directly from read_packet() so that it can gets "its stuff" from our linked
lists. For special events this is done by event_special(). So this is the "you
can return to your caller now"-case. This case is working fine.

When the current thread received what it wanted and is done, some other thread
has to take over reading from the socket. This is done by
_xcb_in_wake_up_next_reader(), but currently this doesn't wake up threads
waiting for special events. This is the bug that is left (and which I cannot
reproduce currently). (Also, your first bug was related to this code as well)

However, since only one thread is allowed to recv() from the socket at any time,
waking up "all" of them just so that most of them go back to sleep without
changing anything isn't really effective. Instead, we'd need a linked list of
threads waiting for special events, similar like we have with wait_for_reply() /
insert_reader() / remove_reader() right now. Then, _xcb_in_wake_up_next_reader()
could wake up exactly one thread waiting for a special event that will then take
over reading from the socket.

This all sounds relatively complicated and non-trivial. Should we instead just
live with the "easy case" of waking up too many threads and let them go to sleep
again? That'd be the patch that was originally proposed in this thread (and
whose commit message might be enhanced with some explanation of the above).

What do others think?

Cheers,
Uli
-- 
“Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold.” – Anna