[Xcb] xcb recursive write not protected in all cases?

Thu Dec 26 09:29:29 PST 2013

Hi again,

TL;DR: I think that having to call get_socket_back() and waiting for
c->out.writing to go to zero in a loop is the only issue we have due to the
iolock getting dropped. The send_sync()s that were mentioned before seem safe to me.

I tried to tackle this systematically. The functions that can drop the iolock
are get_socket_back() and _xcb_conn_wait(). When recursively looking at the
callers of these, we get the following list:

_xcb_conn_wait
_xcb_out_flush_to
_xcb_out_send
_xcb_out_send_sync
get_socket_back
send_request
send_sync
wait_for_reply
write_setup
xcb_connect_to_fd
xcb_flush
xcb_request_check
xcb_send_fd
xcb_send_request
xcb_take_socket
xcb_wait_for_event
xcb_wait_for_reply
xcb_wait_for_special_event
xcb_writev

We can safely ignore xcb_connect_to_fd() and write_setup(), because no other
thread can access the connection at this point and _xcb_conn_wait() and
get_socket_back(), because they do this on purpose. Removing trivially safe
functions (=functions that don't do anything interesting after calling one of
the functions above), the list shrinks down to:

_xcb_out_flush_to
_xcb_out_send
_xcb_out_send_sync
xcb_request_check
xcb_send_request

Let's look at them in detail:

_xcb_out_send:
This assumes that (a) no other thread can become the writing thread "from
beneath us" and thus that c->out.request is the seqno of the last request that
this thread wrote after _xcb_conn_wait() succeeds .
This should be safe because while the iolock is dropped, c->out.writing is
increased and thus all other threads wanting to write wait on c->out.writing.

xcb_request_check:
In xcb_request_check(), a sync is (possibly) sent to make sure we get a reply
with a higher sequence number than the request we are writing for. So this sync
is just send (and then flushed) for its side effects and not much can go wrong.
Also, if the iolock is released, c->out.writing becomes positive and thus no
other writing thread can interfere.

_xcb_out_send_sync:
This has the already-known problem of "another thread can become the writing
thread during get_socket_back()". Keith already has a patch for this.

xcb_send_request:
This suffers from the same problem as above. So let's rather look at the
send_sync() calls. During a send_sync() call, if the iolock is dropped,
c->out.writing will be positive and thus other threads wanting to write can NOT
continue (this also includes calls to xcb_take_socket() thanks to
_xcb_out_flush_to()). Thus, even though the iolock is dropped, we can still
guarantee atomicity. Tricky...
(Of course this assumes that all places wanting to write actually do these
checks, but I think the code does those)
For the same reason, after send_request() returns, no other thread could have
had a chance to write something yet and thus c->out.request will really be the
sequence number of the request that was just written. Everything Ok.

_xcb_out_flush_to:
I can come up with three possible scenarios for the request that this function
is supposed to write:
a) The request was already written
b) Another thread is currently writing the request
c) The request is still in the output buffer

In case (a), the request will be smaller than request_written and the function
correctly does nothing.

For (b), c->out.queue_len must be 0. If the socket is owned externally, this
condition is always true. If xcb owns the socket, only send_request increases
c->out.queue_len. However, in the call chain to send_request, there is always a
function that makes sure c->out.writing is 0 (xcb_send_request(),
_xcb_out_send_sync()). Thus, there can only be one writer and the output queue
is currently empty.
Anyway, back to _xcb_out_flush_to(): This first to "if"s in this function
evaluate to false and this now waits for the existing writer to finish. Thus,
the request that we want to flush must now be flushed, because there is nothing
left in the queue.

In case (c), _xcb_out_flush_to() just writes the request itself. Since the queue
is non-empty, this also makes sure that there is no other writing thread per the
logic in (b). Not really intuitive, but I think this should work.

So in all cases this function gets its job done.

On 24.12.2013 20:41, Keith Packard wrote:
> Keith Packard <keithp at keithp.com> writes:
[...]
> Whew. This isn't as bad as I'd feared. I think we just need to generate
> functions which prepare the socket for Xlib or XCB request writing
> reliably, instead of having ad-hoc versions across the code.
> 
> For XCB, I think the invariants we must provide are:
> 
>         Xlib doesn't own the socket (out.return_socket is NULL)
>         out.writing is zero
>         There is space to write a couple of GetInputFocus requests to out.queue
> 
> This last will make sure that any send_sync calls that are made will
> *not* drop iolock and we will call _xcb_out_send at most one time. I
> think that would be easily done by simply leaving space in out.queue for
> two GetInputFocus requests for 'normal' writes, and then having the
> send_sync calls from xcb_send_request potentially use up that space, but
> all other send_request calls leave it available.

As mentioned above, I don't think that this last item is necessary. No other
thread can write to or take the socket during a send_sync() call, because while
the iolock is released, c->out.writing is non-zero.

Of course, we still get the first two wrong and this needs to be fixed.

> For Xlib, this is done in xcb_take_socket and the invariants required
> are:
>         
>         No-one else owns the socket (out.return_socket is NULL)
>         out.writing is zero
>         out.queue_len is zero
>         
> Looking at _xcb_out_flush_to, I'm confused why that waits for
> out.writing to go to zero; the only requirement that function should
> have is to ensure that the desired request will eventually get onto the
> wire, which is satisfied by making sure that someone has called write.
[...]

Oh, you are right. This means that the assert() would need to go as well...
(However, this could cause a busy loop / live lock in xcb_take_socket() and
xcb_send_fd() which depend on the current behavior. xcb_take_socket() has to
wait for writers on its own anyway and xcb_send_fd() can be fixed.)

I wonder if it would be possible and a good idea to sprinkle some assert()s into
the code that confirm my analysis. For example, why does _xcb_conn_wait() handle
wanting to write while another writer is already active. I currently think that
this can not happen and should instead be an assert(). Another idea could be an
assert(!c->out.writing || c->out.queue_len == 0); after _xcb_conn_wait()
reacquired the iolock to make sure that other threads really do wait for writers.

Cheers,
Uli
-- 
"Every once in a while, declare peace. It confuses the hell out of your enemies"
 - 79th Rule of Acquisition