[Xcb] profiling and performance

Wed May 3 00:56:24 PDT 2006

On Wed, May 03, 2006 at 07:36:50AM +0200, Vincent Torri wrote:
> Hey,

Hi Vincent!

> for the evas bench, with xlib, I get 133 fps, and 131 fps with xcb.

Those are so close to equal, that may just be measurement noise. :-)

Donnie has covered the most important point: even if you can get XCB's
top CPU-consumers to take no time at all, you'll get at best a 0.6%
performance improvement. That's less than 1 fps for your benchmark.

Hooray! That's wonderful news! :-)

> there are other xcb functions that take more time :
> 
> write_block
> _xcb_in_read
> read_packet

First observation: you're running an unoptimized XCB. Stop that. :-)
Maybe your performance differences would go away if you let the compiler
inline functions like write_block and read_packet, and do all its other
optimizations.

XCBSendRequest will be your most significant XCB function, as I
expected, once write_block is inlined into it. If you can get debugging
symbols for your libc -- for Debian, `apt-get install libc6-dbg` :-) --
I expect you'll see memcpy in there too, although it'll be interesting
to see where exactly memcpy shows up. Debugging symbols for your X
server would be nice too, but unlikely to tell us anything about XCB.

I speculate that the _xcb_in_read and read_packet costs here are due to
raster's opposition to threads. ;-) If you'd just put your event loop in
a separate thread and let the OS block it until there was something to
read, you wouldn't have to be polling for events all the time. Inlining
and other optimizations will make a big difference here, but in the end
I think syscalls will be the limiting factor for your event polling.

> it's conceivable :D How can I know that there are more requests than in
> the xlib code ?

After studying your profiling output, I don't really think this is the
case. It does look like maybe you're walking the list of pixmap formats
multiple times, though: if you're doing it once per frame that could
have a small impact. The equivalent Xlib code may have cached something.

It's interesting that you're calling both XCBPutImage and
XCBShmPutImage. Is there a bug preventing you from always using shared
memory?

The only reply you get with any interesting frequency is for
GetInputFocus, presumably for XCBSync. The more XCBSync calls you can
remove without changing the behavior of your program, the higher its
throughput will be. Understanding when you can remove XCBSync calls is
hard though.

To test the hypothesis that the XCB-using code is issuing more requests
than the Xlib version, you could use Ethereal or xscope to log the
requests and responses for both programs, and compare them. Or you could
take the easy way :-) -- after every 25 frames, report the last sequence
number sent. For Xlib, call NextRequest(dpy). For XCB, pick the cookie
of the last request you've sent, and use cookie.sequence.

These numbers won't be equal between the two programs, because Xlib
issues extra requests automatically, and in both apps events will arrive
randomly and cause random requests. But the numbers should increase at
roughly the same rate in both applications.

Anyway, thanks for doing this testing, and for giving me some numbers
that make me happy. :-) Have you shared your results with raster yet? If
they don't make him happy too I'll have to have a "talk" with him. ;-)

--Jamey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: Digital signature
Url : http://lists.freedesktop.org/archives/xcb/attachments/20060503/3990c44f/attachment.pgp