[Xcb] Xlib/XCB results: good!

Sat Apr 2 20:21:32 PST 2005

Hey all,

Bart, Keithp, Josh and I got together yesterday and hacked on X bits for
quite a while. End result: we fixed a number of bugs in both Xlib and
XCB, and wound up with a drop-in Xlib replacement that worked great on
every app we tested, as well as showing promising results on the X Test
Suite.

The big change was for Xlib to correctly track the sequence number of
the last response read from the server. This fixed the assertion
failures, as far as I can tell. (The assertion failures while running
the test suite don't count, as far as I'm concerned.)

Josh worked on his Debian packages of these bits while he wasn't helping
with our other troubleshooting efforts, and we got his packages to
install in a chroot on Bart's computer. This process revealed a segfault
when the .Xauthority file is unreadable, which I fixed. Then we tested
OpenOffice, Mozilla, and some other random apps, and everything worked
great. (We found a bug in OpenOffice, though.)

When I explained to Keith how Xlib/XCB handles unexpected replies these
days (by telling XCB to expect replies from all requests that Xlib
sends, tracking which replies Xlib hasn't asked for yet, and asking for
them once they've provably arrived), he seemed to think I was insane:
the implementation mallocs two or three chunks of memory for every
request that Xlib sends. He suggested two optimizations, and then Bart
made fun of us for discussing optimization before getting the code
correct.

At the end of the day I ran callgrind on xterm for profiling purposes,
and Josh and I spent a little while studying the results. Digging for
replies took about 5% of xterm's time, suggesting that Keith was right
to be concerned. One of his suggested fixes would put me on the road to
re-implementing an X server inside Xlib, so I don't think I'll be going
there; but the other is probably correct: this mechanism exists only to
support async reply handlers, and we can tell whether there are any
handlers registered when we're enqueueing sequence numbers to watch for
later replies. If none are registered, the claim is that we don't need
to watch for any replies, which may or may not be true.

Overnight I let x11perf run using Xlib/XCB (without Keith's proposed
performance improvement). I used Xfake for the server, and dropped my
1GHz Pentium III laptop to runlevel 1 so that nothing was running aside
from init, bash, Xfake, and x11perf. When I checked the run in the
morning, I discovered that the OOM killer had terminated x11perf.
(Aiee!) Turns out I forgot to free one of the chunks of memory allocated
per-request, and x11perf makes a *lot* of requests... So I fixed that
bug this morning, ran valgrind to see if I'd forgotten to free anything
else important, re-ran the test, and got numbers. I'll run the same test
on normal Xlib later, and provide the results.

When I ran the X Test Suite yesterday, there were the four assertion
failures that I've previously concluded are because the test suite is
doing something dumb; and there were 69 FAILs. Keith's quick glance at
the kinds of failures led him to suggest that they could be timing
problems, and therefore still the test suite's fault.

Today I experimented with a timing change to Xlib/XCB's X error handling
that resulted in three X Test Suite tests passing that didn't before,
but caused one test to newly fail too. Looks like I'm going to have to
spend a bunch of time trying to understand the exact timing of error
delivery and related issues in Xlib, and then I'll either be able to
declare these FAILs as test suite bugs, or hopefully eliminate a number
of test suite failures.

I'm sure I've forgotted to mention details. Somebody prod me if you
notice I missed something.

--Jamey