Optimising xserver (Xft text rendering improvements)

Soeren Sandmann sandmann at daimi.au.dk
Sat Mar 26 10:46:55 PST 2005


Adam Jackson <ajax at nwnk.net> writes:

> On Friday 25 March 2005 15:47, Soeren Sandmann wrote:
> > Adam Jackson <ajax at nwnk.net> writes:
> > > > Passing 12 arguments to a function really is a performance killer and
> > > > I'd like to think this could be kept in mind when further developing
> > > > xserver (or any software in general!).
> > >
> > > Yes, definitely.  There are lots of places where we do things that are
> > > stylistically fine but that don't generate good code at all.  The
> > > software Render path is the egregious offender, but there are others.
> >
> > Before getting rid of those 12 arguments, I'd really like to see
> > numbers. Usually almost all of the time in the fb layer is spent in
> > loops in the leaf functions; almost none actually calling the
> > functions.
> 
> The exception being fbcompose.c, where we call at least four functions per 
> pixel in the untransformed case, each function being pretty close to trivial.  
> This totally defeats any attempt the compiler might make at CPU pipelining.  
> And you can't just mark the leaves inline, because we call them through a 
> dispatch table.  Some judicious unrolling here would probably make software 
> Render blow a lot less.

Well sure, fbcompose.c is a bit on the slow side to put it mildly. In
fact I consider the functions in fbcompose.c *unusable*. If your
desktop hits them you lose. Fortunately, given the way applications
use render, they don't actually hit those functions. The paths they do
hit were long ago special-cased in faster C code and, more recently,
with MMX code.

I don't think any judicious unrolling, or
getting-rid-of-the-12-arguments, will fix the fbCompositeGeneral() in
any real way. Fixes like that will give us perhaps a 50% speedup, but
to make fbCompositeGeneral() usable, what we need is more like a 1000%
speedup.

In my opinion we should just leave it alone, because with actual
applications we hit the fast paths almost exclusively. As applications
show up that don't, we can add more fast paths.

Alternatively, some brilliant student with too much time on his hands
could figure out how to make the fbcompose.c framework generate
machine code on the fly. That would fix the problem completely, but
it's probably fairly hard to do.


Søren



More information about the xorg mailing list