I also recently discovered that this can be a pathological path, but
AFAICT the bottleneck isn't what you think it is (yet) but the fact that
the a8 pixmap gets fully initialized by the blitter and then immediately
partly overwritten by software rendering. This is especially bad with
the current pixmap migration schemes because it means kicking off the
blitter, waiting for it to finish and then reading back the full pixmap
contents to system memory, which usually involves at least one slow
memcpy. This will at least partly go away when using TTM for pixmaps,
but meanwhile I've played a little with just taking note that a pixmap
is fully covered by a solid colour and only actually initializing its
contents when and where appropriate. I haven't got it working yet though

> Along with the rendering of glyphs, these small but frequent operations are
> likely to dominate rendering time for the typical desktop.

FWIW, the last text rendering speed numbers I saw didn't seem much if at
all worse with EXA than with XAA.

IME using sysprof instead of oprofile helps thanks to its intuitive

