[Xcb] [PATCH 3/3] fix pthread_cond_wait and get_socket_back dependency

Uli Schlachter psychon at znc.in
Fri Jun 7 11:07:53 PDT 2013


On 07.06.2013 18:27, Josh Triplett wrote:
> On Fri, Jun 07, 2013 at 05:11:06PM +0200, Uli Schlachter wrote:
>> Some thread A is currently writing (_xcb_conn_wait(), called via
>> _xcb_out_send()), thus c->out.writing != 0. Now thread B calls
>> xcb_send_request() and ends up waiting on c->out.cond until thread A is
>> finished. Thread C now calls xcb_take_socket() and ends up waiting for thread A,
>> too (*).
>>
>> (*): This is complicated: xcb_take_socket() -> _xcb_out_flush_to() ->
>> xcb_out_send() -> _xcb_conn_wait() -> pthread_cond_wait() on c->out.cond.
>>
>> Once thread A is done writing, _xcb_out_send() broadcasts on c->out.cond, so
>> threads B and C will both wake up and race for the iolock.
>>
>> If thread C wins this race, the socket is taken. Next, thread B will call
>> send_request() without calling get_socket_back() first. This is wrong, since
>> thread C could be calling xcb_writev() at the same time, too, and sequence
>> numbers can get confused.
> 
> I don't see how this can happen the way you've described.  With thread A
> writing, threads B and C are both waiting to acquire the iolock.  If
> thread C wins, it will take the socket before it unlocks the iolock.
> After C drop the iolock, B will take it, and then immediately after
> taking the iolock B will call get_socket_back.

Well, no. Thread A is in _xcb_conn_wait()'s poll() (or select()) call. The last
thing it did before calling poll()/select() was dropping the iolock. Thus,
thread B can take the socket, call get_socket_back() (nothing to do here), then
notices that there is a writer and waits for the writer. It won't check
get_socket_back() again, which is what the patch changes.

> That said, I do find myself staring at the later parts of
> xcb_send_request, specifically the calls to send_sync can drop the
> iolock; xcb_take_socket takes the iolock, but does not check
> c->out.writing.  Jamey?

I don't think so.

xcb_take_socket() does check c->out.writing:
Before dropping the iolock, send_request() will have done ++c->out.request.
Thus, in xcb_take_socket(), the condition "c->out.request !=
c->out.request_written" ends up being true and this will call
_xcb_out_flush_to() in a loop (thanks to Keith fixing another bug here :-) ).
_xcb_out_flush_to() will then end up waiting for the pre-existing writer.

So, I think there is no problem inside xcb_take_socket(), it will wait for the
writer. Also, the writer should(?) not have any problems, because no other
thread can continue before it finishes and when it finished it will do the next
send_request() call again before dropping the iolock.

>> The patch solves this problem by calling get_socket_back() after waiting for a
>> writer to get out of the way. Since this could drop the iolock, the whole dance
>> with waiting for writers has to be done again.
> 
> Only because of patch 2 of this series, which was not around when we
> tried to work out the correctness of xcb_take_socket. :)

I don't think so. get_socket_back() already dropped the iolock before. The only
change due to patch 2 happens when multiple threads will call get_socket_back()
"at the same time". However, in the above scenario, only thread B would ever
call get_socket_back().

>> This is my understanding of the problem. I guess it wouldn't hurt if the commit
>> message mentioned what the patch is about in more detail...
> 
> Agreed.
> 
>> P.S.: Meh. xcb_take_socket() first calls get_socket_back(), then
>> _xcb_out_flush_to(). The latter function can drop the iolock and thus another
>> thread could have successfully called xcb_take_socket() before this call. These
>> races with xcb_take_socket() seem complicated enough that someone should prove
>> that they are no problem, or at least that suitable assert()s should be added...
> 
> Yeah, as mentioned above, we need to stare at the lock-dropping cases
> more closely, both those in the xcb_take_socket path and those in the
> xcb_send_request path, to make sure they exclude each other.

Perhaps it would be best to have some wrapper functions for acquiring/releasing
the iolock (or rather: acquiring the socket?). During acquire, the wrapper would
also make sure that writers get out of the way and that the socket is not taken.
The release wrapper would just be there for consistency.

Hopefully these could be used everywhere and thus we have to worry about less
special cases. This would only leave doubts about code which continues after it
wrote to the socket (and thus possibly dropped the iolock)...

If only I had more time for this. :-(

Uli
-- 
Q: Because it reverses the logical flow of conversation.
A: Why is putting a reply at the top of the message frowned upon?


More information about the Xcb mailing list