[Xcb] xcb recursive write not protected in all cases?

Tue Dec 24 10:45:59 PST 2013

Uli Schlachter <psychon at znc.in> writes:

> Something like this was already noticed in May by Christian König, but back then
> was only a theoretical issue that he didn't actually manage to hit. See the
> threads starting at these mails:
>
> http://lists.freedesktop.org/archives/xcb/2013-May/008265.html
> http://lists.freedesktop.org/archives/xcb/2013-May/008272.html

Cool. I'm not quite sure those are quite correct as they appear to
require out.writing to be set before get_socket_back will be
called. However, there's a lot in common between someone calling writev
and Xlib owning the socket, so perhaps these tests should be checking
out.writing || out.return_socket?

> I am not sure if send_request() is the correct place for this. Let's look where
> get_socket_back() and out.writing are used:
>
> xcb_send_request():
>     [...]
>     /* get a sequence number and arrange for delivery. */
>     pthread_mutex_lock(&c->iolock);
>     /* wait for other writing threads to get out of my way. */
>     while(c->out.writing)
>         pthread_cond_wait(&c->out.cond, &c->iolock);
>     get_socket_back(c);
>     [...]
>
> xcb_take_socket():
> This only calls get_socket_back(). Which makes me wonder: What makes sure that
> pre-existing writers get out of the way?

xcb_take_socket also calls _xcb_out_flush_to which will wait for writing
to go to zero. However, that isn't checking for out.return_socket, so I
guess there's a race with Xlib ownership there?

> However,
> this also wants to flush the connection, thus we have three things that need to
> be done in a loop (and the current code is quite wrong).

Oh, good point, xcb_take_socket gets the socket back to XCB and then
calls _xcb_out_flush_to which may well lose the socket again. Seems like
the locking was all messed up when the Xlib socket passing was added.

> _xcb_out_send_sync():
> void _xcb_out_send_sync(xcb_connection_t *c)
> {
>     /* wait for other writing threads to get out of my way. */
>     while(c->out.writing)
>         pthread_cond_wait(&c->out.cond, &c->iolock);
>     get_socket_back(c);
>     send_sync(c);
> }
>
> _xcb_out_flush_to():
> Just makes sure that previous writers go out of the way so that
> c->out.request_written stabilizes.
>
> So I guess the first three of these four places really want to make sure that
> they can send requests which means that there are no other writers and the
> socket isn't owned externally. And all of them do it wrong.
>
> Thus, I would suggest a helper function (something like wait_for_writers() or
> prepare_writer()) that can then be used in all of the above places.

>
>
> While looking at the above I noticed that send_sync() -> send_request() ->
> _xcb_out_send() -> _xcb_conn_wait() can drop the iolock. Thus, it is impossible
> to make sure that we always keep the iolock in xcb_send_request().
>
> This means that making sure we can write in send_request(), as your patch does,
> really is a good idea. However, this might render some other locking irrelevant,
> e.g. _xcb_out_end_sync() can now just call send_sync() directly.

Right, moving the locking right into send_request seems like a good
plan.

There's one issue -- the send_sync which ensures that we don't
ever have a reply using sequence number 0 will not be atomic with the
actual request delivery and so send_sync may be invoked multiple times, and
the invariant could in theory be violated if 2**32 other requests are
generated before we get back to send our request. I think we can safely
ignore this issue and just do the sequence number fixups before ever
calling get_socket_back, and waiting for writers to go to zero.

> Next thing I notice: _xcb_out_flush_to() calls _xcb_out_send() without checking
> for existing writers. This can result in two threads trying to write at the same
> time. One of them will block on the c->out.writing condition inside of
> _xcb_conn_wait(). At this time, the same thing can happen to other threads, all
> of them waiting for out->writing. When the existing writer is done, some random
> thread will win and be allowed to write. This means that requests might be sent
> in the wrong order and thus with another sequence number than
> expected.

Oh. We need to hold writing *before* allocating the sequence number. I
had walked that code and not realized that this could happen.

> I'll have to think about this last issue a bit more, but it "smells
> fishy".

Yeah, I think we need to move the wait for reading/writing out of
_xcb_conn_wait and do that before allocating the sequence number.

Ick. We've got a lot of work to do here to fix this.

-- 
keith.packard at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/xcb/attachments/20131224/3ac84513/attachment.pgp>