[Xcb] xcb recursive write not protected in all cases?

Uli Schlachter psychon at znc.in
Tue Dec 24 03:46:13 PST 2013

On 24.12.2013 06:23, Keith Packard wrote:
> Josh Triplett <josh at joshtriplett.org> writes:
>> On Mon, Dec 23, 2013 at 05:00:46PM -0800, Keith Packard wrote:
>>> I think we need to check out.writing closer to the actual use of the
>>> out.queue; perhaps it would suffice to check that down inside
>>> send_request itself? That would certainly offer the surest guarantee
>>> that iolock wasn't dropped between the check for out.writing and the use
>>> of out.queue.
>> Based on your analysis, that sounds like the best plan indeed: check
>> while holding the lock in send_request, where there's no way we can
>> subsequently drop it before sending the request.
> I ran into another issue in the same area -- while waiting for
> out.writing, we can lose the socket back to Xlib, append some data to
> the request buffer and then Xlib comes along and tries to write another
> request to the socket and *bam* another assertion failure in write_vec.
> I think we have to block until we have the socket *and* out.writing is
> false.

Something like this was already noticed in May by Christian König, but back then
was only a theoretical issue that he didn't actually manage to hit. See the
threads starting at these mails:


> Something like this:

I am not sure if send_request() is the correct place for this. Let's look where
get_socket_back() and out.writing are used:

    /* get a sequence number and arrange for delivery. */
    /* wait for other writing threads to get out of my way. */
        pthread_cond_wait(&c->out.cond, &c->iolock);

This only calls get_socket_back(). Which makes me wonder: What makes sure that
pre-existing writers get out of the way? xcb_writev() certainly doesn't. So I
guess this really wants to wait for writers to get out of the way, too. However,
this also wants to flush the connection, thus we have three things that need to
be done in a loop (and the current code is quite wrong).

void _xcb_out_send_sync(xcb_connection_t *c)
    /* wait for other writing threads to get out of my way. */
        pthread_cond_wait(&c->out.cond, &c->iolock);

Just makes sure that previous writers go out of the way so that
c->out.request_written stabilizes.

So I guess the first three of these four places really want to make sure that
they can send requests which means that there are no other writers and the
socket isn't owned externally. And all of them do it wrong.

Thus, I would suggest a helper function (something like wait_for_writers() or
prepare_writer()) that can then be used in all of the above places.

While looking at the above I noticed that send_sync() -> send_request() ->
_xcb_out_send() -> _xcb_conn_wait() can drop the iolock. Thus, it is impossible
to make sure that we always keep the iolock in xcb_send_request().

This means that making sure we can write in send_request(), as your patch does,
really is a good idea. However, this might render some other locking irrelevant,
e.g. _xcb_out_end_sync() can now just call send_sync() directly.

Next thing I notice: _xcb_out_flush_to() calls _xcb_out_send() without checking
for existing writers. This can result in two threads trying to write at the same
time. One of them will block on the c->out.writing condition inside of
_xcb_conn_wait(). At this time, the same thing can happen to other threads, all
of them waiting for out->writing. When the existing writer is done, some random
thread will win and be allowed to write. This means that requests might be sent
in the wrong order and thus with another sequence number than expected.

I'll have to think about this last issue a bit more, but it "smells fishy".

- Captain, I think I should tell you I've never
  actually landed a starship before.
- That's all right, Lieutenant, neither have I.

More information about the Xcb mailing list