[Xcb] Remaining Xlib/Xcb problems

Wed Sep 29 16:48:05 PDT 2010

Hi Rami! Thanks for looking at this. Sorry my response is very long; I
hope it will inspire discussion.

On Wed, Sep 29, 2010 at 5:07 AM, Rami Ylimäki <rami.ylimaki at vincit.fi> wrote:
> I'd like to try solving https://bugs.freedesktop.org/show_bug.cgi?id=29561
> properly and eliminate excessive non-blocking reads done by Xlib. ...
> My understanding from Jamey's comment is that any interface
> changes are not likely to be accepted unless the changed interface can also
> be used to facilitate solving of remaining Xlib problems at least in
> principle.

Well, we can certainly talk about smaller alternatives. I'm just excited
about first-class queues because they seem to solve so many problems.

>  1. Concurrent event waiters can prevent reply waiters from advancing until
> an event arrives.
> ...
>  3. Sometimes a non-blocking read system call is done even though it
> wouldn't be necessary.

Correct.

Though, I got thinking yesterday that the Xlib multi-thread event hangs
might be fixable if we use select/poll instead of xcb_wait_for_event.

>  2. Requests that expect multiple replies without an asynchronous reply
> handler aren't thread safe. Examples of this kind of request are
> XListFontsWithInfo and XRecordEnableContext.

Exposing a first-class response queue abstraction from XCB won't fix
this particular bug. These functions were never thread-safe, and I don't
believe they can be fixed without fundamental architecture changes to
Xlib. Which, of course, is called "XCB". :-)

> Is there something else that I should be aware of?

Here are several more cases which motivated me to consider first-class
queues.

- When Chris Wilson implemented X error handling in cairo-xcb last year,
  he noticed that it's a pain to use checked error handling. Now cairo
  has a queue of void cookies so that it can call xcb_request_check on
  all of them at certain sync points, and every request it makes has to
  be wrapped up to add cookies to that queue. Libraries like cairo would
  be easier to implement with a simple API saying "just put any errors
  for the requests I issue over here".

- More generally, having checked and unchecked stub variants for every
  single X request is lame. That distinction then leaked out to most of
  the xcb-util APIs, and will probably leak further.

- Requests like https://bugs.freedesktop.org/show_bug.cgi?id=29657 come
  up from time to time. Quite a bit of event-loop integration is already
  possible with XCB today, in principle, but it isn't easy and I don't
  think anybody has tested it. I expect it would be easier if you could
  have all your events, errors, *and* replies put in the same queue.

- In those rare applications that care about raw request throughput on
  the X socket, XCB doesn't perform as well as Xlib because
  xcb_send_request is a fairly expensive operation. The expensive bits:
  it's thread-safe, it usually memcpy's requests, and it normally
  computes the length and major-opcode fields, which the caller is in a
  better position to do. (The last point was a misguided code-size
  optimization that I was very proud of at the time. Sigh.)

> Once there is an agreement of the remaining problems, we can start to
> design a response queue that would best address the issues.

Here's roughly what I have in mind: first-class request queues and
response queues.

An XCB connection has one response queue that events are always
delivered into, and one that xcb_wait_for_reply and xcb_poll_for_reply
operate on. You can get those queues and use them in new request queues.
You can also create new response queues.

When creating a request queue, you specify a response queue for its
errors and one for its replies. These can optionally be the same queue.
If we need other per-request configuration, make the request queue carry
the configuration, instead of having per-request flags: It's a lot
easier to add new request-queue constructors than to add new variants to
the stubs for every request.

Both kinds of queues must be bound to an XCB connection when you create
them: A response queue has to know which connection's socket to read
from when it is empty, and since a request queue is bound to response
queues, it's only usable on the same connection as those response
queues. Conveniently then, any function that takes a queue as a
parameter doesn't also need to be passed a connection.

A response queue can be consumed using xcb_wait_for_response or
xcb_poll_for_response, which return the first queued response. I don't
think a wait_for_reply-style API, which looks through the queue for a
particular sequence number, is either necessary or desirable. If you
want to ask for a specific reply, either use the default reply queue and
xcb_wait_for_reply, or use a different reply queue for each request that
you want to distinguish.

A queue is not safe to use from multiple threads concurrently, but
you can use different queues that are bound to the same connection
concurrently. This policy lets us move most work out of critical
sections without inconveniencing most callers, which usually either
aren't thread-safe themselves, or already have their own locks.

Request queues must be explicitly flushed to the socket. Flush must
atomically do several things: assign sequence numbers to the queued
requests, ensure they're all written consecutively, and record which
queues to use for replies and errors for those sequence numbers. Any
auto-flush implementation would have to synchronize with other writing
threads, and so would assigning sequence numbers as requests are queued,
so I want to avoid both.

Unlike xcb_send_request, the request queue API would not expose 'struct
iovec'; you'd just repeatedly add pointer/length pairs to the queue.
Since the request queue isn't thread-safe anyway, request construction
doesn't need to be atomic. These pointers would never be memcpy'd: You
must only free the buffer after you flush the request queue. For
convenience, the request queue could provide an allocator for memory
that will be automatically freed when that queue is flushed.

xcb_send_request would be reimplemented on top of a pair of request
queues, one used for checked requests and the other for unchecked
requests. These request queues would use the standard reply queue for
all responses, except the unchecked queue would use the standard event
queue for errors. To safely return a sequence number for each request,
xcb_send_request needs to either use xcb_take_socket or flush after
every request.

Open questions:

When should GetInputFocus ("sync") requests be automatically inserted?
We can't know we need one until we assign a sequence number, but
inserting syncs into the middle of a request queue at flush time would
suck. After queueing 65,535 void requests we'd definitely need to insert
a sync, and we could decide at flush time if we also needed to insert a
sync before the start of the entire queue. So what API does the request
queue need for this?

We probably need an xcb_discard_response. Does it discard the response
from a single response queue? from both queues bound in a request queue?
from all queues bound to a connection?

How should xcb_wait_for_response return the sequence number of the
response? Since this is new API we can expose 64-bit sequence numbers.
I'm inclined to leave the full sequence number out of the response
structure itself, unlike xcb_generic_event_t, and hand it back as an
out-parameter of xcb_wait_for_response instead. That way the response
structure you get back is always *exactly* what came off the wire.

In principle, XCB's current request and response functions could be
implemented using only these new queue functions. If we thought about
this in terms of designing an "XCB 2.0", and re-implementing XCB 1 as a
separate library on top, would that change how we do this? For example,
the standard reply queue might go away. Even if we don't release this as
a new library, we might choose to deprecate the old functions.

Observations:

Since the caller knows exactly when the requests are flushed, it can
modify any part of the request buffer until then. This means you can
build iteratee-style interfaces that go back and fix up the length field
after the request is constructed, or you can append more data to a
request that was already complete, like XDrawPoint and friends do.

Response queues are single-producer (since there's a lock on access to
the socket, only one thread can enqueue responses at a time) and
single-consumer (since queues aren't allowed to be used from multiple
threads), so they can be implemented without locks. For a first
implementation I'd probably just always take the I/O lock, though.

Sometimes it might be useful to prepare a request queue well before you
need it. And when you flush a request queue, that doesn't have to be the
same thing as emptying the queue, so you could submit the same queue
many times in rapid succession. You might think of request queues as
sort of like OpenGL display lists or something.

Callers should generally not use xcb_wait_for_response on the default
event or reply queues, since it isn't thread-safe and those queues are
shared between all users of that connection. Worse, on the reply queue
you might consume a response that some other part of the program
expected. However, if you can prove it's safe in your application, and
you really care about the tiny lock overhead, there's nothing stopping
you.

Extensions that might not be justified:

When you close a response queue, XCB discards all responses for that
queue, including responses that haven't arrived yet. I think you
shouldn't be allowed to close the standard event or reply queues. (What
happens if you close a queue used for replies but not the queue used for
errors, or vice versa?)

Two-stage flush: Provide an alternative flush call that queues the
requests on the XCB connection but returns without blocking on the
socket. Return some handle that can be used to find out whether that
flush is complete, so the caller knows when it's safe to free buffers.

Obviously I have a lot of thoughts about this proposal. :-) Comments?

Jamey