[Xcb] [PATCH v2 2/2] Also signal special event waiters in _xcb_in_wake_up_next_reader

Uli Schlachter psychon at znc.in
Wed Jun 10 03:56:50 PDT 2015


Am 10.06.2015 um 12:00 schrieb Michel Dänzer:
> On 10.06.2015 16:32, Uli Schlachter wrote:
>> Am 01.06.2015 um 04:09 schrieb Michel Dänzer:
>>> On 30.05.2015 21:22, Uli Schlachter wrote:
[...]
>>>> Hrm. This is basically a pthread_cond_broadcast() waking up all threads, even
>>>> those for which we do not have a special event.
>>>>
>>>> Can you explain why this is necessary at all? The function event_special()
>>>> should be handling this. This is called on all special events. It appends the
>>>> event to the xcb_special_event_t's event queue and then calls
>>>> pthread_cond_signal() on exactly this condition variable.
>>>>
>>>> In my understanding, that should be enough. Why would we need a "wake up all
>>>> waiting threads and let them figure out if they have to do something themselves"?
>>>
>>> I can't explain why, but what you're saying applies to "normal" events
>>> as well, doesn't it? _xcb_in_wake_up_next_reader() signals
>>> c->in.event_cond, even though read_packet() already does the same. This
>>> change is mirroring that for special events.
>>
>> Can you try changing your above pthread_cond_signal() into a
>> pthread_cond_broadcast?
> 
> Not that in contrast to what you wrote above, the code I'm adding isn't
> the same as pthread_cond_broadcast: My code calls pthread_cond_signal
> for each special event condition variable, waking up one random thread
> (or none) waiting for each of them. pthread_cond_broadcast wakes up all
> threads waiting for a condition variable.
> 
> So, I'm confused why you're now asking me to try waking up even more
> threads, when you wrote above that waking any additional threads
> shouldn't be necessary.

I got myself confused, sorry. The above shouldn't make a difference, but in the
following...

>> Alternatively, change the pthread_cond_signal() in event_special()
>> to pthread_cond_broadcast() and see if that helps.
>>
>> (The idea being: There are multiple threads in xcb_wait_for_special_event()
>> concurrently.)

... the idea was for a case where multiple threads are stuck in
xcb_wait_for_special_event(). My current understanding is that
_xcb_in_wake_up_next_reader() is needed for a case where multiple threads are
waiting for events, replies or special events. To you know if Mesa ever does
this? For the same kind of event or for different ones?

> That doesn't fix the first problem described below.

I understand this as "this doesn't fix any problem", but just to be sure: Do you
mean "this fixes some problem, but not all"?

>> I tried the reproducing program from [1] and while I can also reproduce the hang
>> that your first patch fixes (and I can confirm that the patch fixes the
>> problem), I cannot reproduce any hang that this patch would have to fix.
> 
> As described in the commit log, the test case for this patch is a
> different one: The Mesa demos program called glthreads, which creates
> several threads rendering with OpenGL (to one separate window each)
> concurrently.
> 
> The problem fixed by this patch is that glthreads often hangs when
> trying to quit by pressing the Escape key.

Is there any easy way to get glthreads on debian that doesn't involve building
something "big" like all of mesa? There doesn't seem to be a debian package for
it.

Does this hang have the same backtrace as what the example program demonstrates
and that is fixed with patch 1: One thread stuck in _xcb_conn_wait() ->
pthread_cond_wait and no other threads inside of libxcb?

> Note that glthreads also exhibits another problem, which none of the
> changes above fix: Sometimes when starting it, the windows stay black
> until I press any key. I'm including the backtrace corresponding to this
> hang at the end of this message.
> 
> 
> Another problem which I suspect might be related, but which isn't fixed
> by these changes, is https://bugs.freedesktop.org/show_bug.cgi?id=90798

I have no idea how XCB could have anything to do with this, nor do I know how
this could happen at all, sorry.

> All of these problems only occur when the OpenGL implementation uses
> the DRI3/Present extensions, which rely on special events.
> 
> 
> Thread 2 (Thread 0x7ff7ba9ce700 (LWP 5359)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007ff7c36ab3d5 in _XReply () from /usr/lib/x86_64-linux-gnu/libX11.so.6
> #2  0x00007ff7c36a6cad in XSync () from /usr/lib/x86_64-linux-gnu/libX11.so.6
> #3  0x00007ff7c3d089bb in dri3_update_drawable (driDrawable=<optimized out>, loaderPrivate=0x7ff7ac0008c0) at ../../../src/glx/dri3_glx.c:1072
> #4  dri3_get_buffers (driDrawable=<optimized out>, format=4098, stamp=<optimized out>, loaderPrivate=0x7ff7ac0008c0, buffer_mask=1, buffers=0x7ff7ba9cdb70) at ../../../src/glx/dri3_glx.c:1438
> #5  0x00007ff7be9d7d60 in dri_image_drawable_get_buffers (statts_count=<optimized out>, statts=<optimized out>, images=<optimized out>, drawable=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri2.c:279
> #6  dri2_allocate_textures (ctx=0x10b8270, drawable=0x7ff7ac0009f0, statts=0x7ff7ac001048, statts_count=2) at ../../../../../src/gallium/state_trackers/dri/dri2.c:402
> #7  0x00007ff7be9d3dbd in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x7ff7ac001048, count=2, out=0x7ff7ba9cdcd0) at ../../../../../src/gallium/state_trackers/dri/dri_drawable.c:83
> #8  0x00007ff7be8bdfde in st_framebuffer_validate (stfb=0x7ff7ac000c00, st=st at entry=0x111cfb0) at ../../../src/mesa/state_tracker/st_manager.c:200
> #9  0x00007ff7be8bf513 in st_api_make_current (stapi=<optimized out>, stctxi=0x111cfb0, stdrawi=0x7ff7ac0009f0, streadi=0x7ff7ac0009f0) at ../../../src/mesa/state_tracker/st_manager.c:769
> #10 0x00007ff7be9d38f1 in dri_make_current (cPriv=<optimized out>, driDrawPriv=0x7ff7ac0009b0, driReadPriv=0x7ff7ac0009b0) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:234
> #11 0x00007ff7be9cf314 in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:538
> #12 0x00007ff7c3d063da in dri3_bind_context (context=0x10b8090, old=<optimized out>, draw=<optimized out>, read=<optimized out>) at ../../../src/glx/dri3_glx.c:145
> #13 0x00007ff7c3cd948e in MakeContextCurrent (gc_user=0x10b8090, read=39845893, draw=39845893, dpy=0xfd9c80) at ../../../src/glx/glxcurrent.c:228
> #14 glXMakeCurrent (dpy=0xfd9c80, draw=39845893, gc=0x10b8090) at ../../../src/glx/glxcurrent.c:262
> #15 0x0000000000402588 in draw_loop (wt=0x6045b8 <WinThreads+56>) at glthreads.c:243
> #16 thread_function (p=0x6045b8 <WinThreads+56>) at glthreads.c:526
> #17 0x00007ff7c32380a4 in start_thread (arg=0x7ff7ba9ce700) at pthread_create.c:309
> #18 0x00007ff7c2f6604d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> Thread 1 (Thread 0x7ff7c46377c0 (LWP 5356)):
> #0  0x00007ff7c2f5d50d in poll () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00007ff7c0e3bfd2 in _xcb_conn_wait (c=0xfdb3d0, cond=<optimized out>, vector=0x0, count=0x0) at ../../src/xcb_conn.c:479
> #2  0x00007ff7c0e3db6f in xcb_wait_for_event (c=0x7ffc7b991740) at ../../src/xcb_in.c:668
> #3  0x00007ff7c36aafe8 in _XReadEvents () from /usr/lib/x86_64-linux-gnu/libX11.so.6
> #4  0x00007ff7c369a078 in XNextEvent () from /usr/lib/x86_64-linux-gnu/libX11.so.6
> #5  0x0000000000401ff5 in event_loop (dpy=<optimized out>) at glthreads.c:378
> #6  main (argc=<optimized out>, argv=<optimized out>) at glthreads.c:684

This is a bug in libX11. To quote the source for _XReply():

	/* We have the response we're looking for. Now, before
	 * letting anyone else process this sequence number, we
	 * need to process any events that should have come
	 * earlier. */

	if(dpy->xcb->event_owner == XlibOwnsEventQueue)
	{
		xcb_generic_reply_t *event;
		/* If some thread is already waiting for events,
		 * it will get the first one. That thread must
		 * process that event before we can continue. */
		/* FIXME: That event might be after this reply,
		 * and might never even come--or there might be
		 * multiple threads trying to get events. */
		while(dpy->xcb->event_waiter)
		{ /* need braces around ConditionWait */
			ConditionWait(dpy, dpy->xcb->event_notify);
		}
		while((event = poll_for_event(dpy)))
			handle_response(dpy, event, True);
	}

So in the above case, _XReply() got the response that XSync() wants, but it also
needs to handle any events that came "on the wire" before this response. XCB
pretty much hides/destroys this order.

To get it, this code wants to poll for events, but only one thread at a time can
handle events and thus this first wants to wait for the other thread waiting for
events to be done. This is also why pressing a key helps: Any event will wake up
thread 1 and make it return from XNextEvent() and "on its way out" it will wake
up thread 2 which is now allowed to make sure that no event came before this
reply "on wire".

I don't know what can be done about this problem, but it's a known threading bug
in libX11.

Cheers,
Uli
-- 
- Buck, when, exactly, did you lose your mind?
- Three months ago. I woke up one morning married to a pineapple.
  An ugly pineapple... But I loved her.


More information about the Xcb mailing list