[Xcb] [PATCH 3/3] fix pthread_cond_wait and get_socket_back dependency

Fri Jun 7 12:06:39 PDT 2013

On Fri, Jun 07, 2013 at 08:07:53PM +0200, Uli Schlachter wrote:
> On 07.06.2013 18:27, Josh Triplett wrote:
> > On Fri, Jun 07, 2013 at 05:11:06PM +0200, Uli Schlachter wrote:
> >> Some thread A is currently writing (_xcb_conn_wait(), called via
> >> _xcb_out_send()), thus c->out.writing != 0. Now thread B calls
> >> xcb_send_request() and ends up waiting on c->out.cond until thread A is
> >> finished. Thread C now calls xcb_take_socket() and ends up waiting for thread A,
> >> too (*).
> >>
> >> (*): This is complicated: xcb_take_socket() -> _xcb_out_flush_to() ->
> >> xcb_out_send() -> _xcb_conn_wait() -> pthread_cond_wait() on c->out.cond.
> >>
> >> Once thread A is done writing, _xcb_out_send() broadcasts on c->out.cond, so
> >> threads B and C will both wake up and race for the iolock.
> >>
> >> If thread C wins this race, the socket is taken. Next, thread B will call
> >> send_request() without calling get_socket_back() first. This is wrong, since
> >> thread C could be calling xcb_writev() at the same time, too, and sequence
> >> numbers can get confused.
> > 
> > I don't see how this can happen the way you've described.  With thread A
> > writing, threads B and C are both waiting to acquire the iolock.  If
> > thread C wins, it will take the socket before it unlocks the iolock.
> > After C drop the iolock, B will take it, and then immediately after
> > taking the iolock B will call get_socket_back.
> 
> Well, no. Thread A is in _xcb_conn_wait()'s poll() (or select()) call. The last
> thing it did before calling poll()/select() was dropping the iolock. Thus,
> thread B can take the socket, call get_socket_back() (nothing to do here), then
> notices that there is a writer and waits for the writer. It won't check
> get_socket_back() again, which is what the patch changes.

I see; missed that detail in the scenario you outlined.

> > That said, I do find myself staring at the later parts of
> > xcb_send_request, specifically the calls to send_sync can drop the
> > iolock; xcb_take_socket takes the iolock, but does not check
> > c->out.writing.  Jamey?
> 
> I don't think so.
> 
> xcb_take_socket() does check c->out.writing:
> Before dropping the iolock, send_request() will have done ++c->out.request.
> Thus, in xcb_take_socket(), the condition "c->out.request !=
> c->out.request_written" ends up being true and this will call
> _xcb_out_flush_to() in a loop (thanks to Keith fixing another bug here :-) ).
> _xcb_out_flush_to() will then end up waiting for the pre-existing writer.
> 
> So, I think there is no problem inside xcb_take_socket(), it will wait for the
> writer. Also, the writer should(?) not have any problems, because no other
> thread can continue before it finishes and when it finished it will do the next
> send_request() call again before dropping the iolock.

That *sounds* right...

> >> This is my understanding of the problem. I guess it wouldn't hurt if the commit
> >> message mentioned what the patch is about in more detail...
> > 
> > Agreed.
> > 
> >> P.S.: Meh. xcb_take_socket() first calls get_socket_back(), then
> >> _xcb_out_flush_to(). The latter function can drop the iolock and thus another
> >> thread could have successfully called xcb_take_socket() before this call. These
> >> races with xcb_take_socket() seem complicated enough that someone should prove
> >> that they are no problem, or at least that suitable assert()s should be added...
> > 
> > Yeah, as mentioned above, we need to stare at the lock-dropping cases
> > more closely, both those in the xcb_take_socket path and those in the
> > xcb_send_request path, to make sure they exclude each other.
> 
> Perhaps it would be best to have some wrapper functions for acquiring/releasing
> the iolock (or rather: acquiring the socket?). During acquire, the wrapper would
> also make sure that writers get out of the way and that the socket is not taken.
> The release wrapper would just be there for consistency.
> 
> Hopefully these could be used everywhere and thus we have to worry about less
> special cases. This would only leave doubts about code which continues after it
> wrote to the socket (and thus possibly dropped the iolock)...

That sounds like a good idea.  Likewise we'd need a wrapper for waiting
on a condition variable, since that drops the lock.

- Josh Triplett