Optimising xserver (Xft text rendering improvements)

Sun Mar 27 00:08:56 PST 2005

Hi,

Trolltech has now hired Zack Rusin to work on X11 (he's cc'ed in this mail). 
Optimising render was one of the first things I hope he will be able to look 
into. If Zack agrees, we could start working on this in a week from now.

On Saturday 26 March 2005 22:56, Keith Packard wrote:
> Around 19 o'clock on Mar 26, Soeren Sandmann wrote:
> > In my opinion we should just leave it alone, because with actual
> > applications we hit the fast paths almost exclusively. As applications
> > show up that don't, we can add more fast paths.

There are ways to get decent speed also for a general software implementation. 
I think that you currently you see no applications hitting the slow paths, as 
they immediately throw this option away once they start testing it. 

We actually tried using the transformations Render offers for Qt4 and found 
out that it was a factor of 5 (or more) faster to get the data over to the 
client and do it ourselves. So we have the code, but never enabled it and 
continued doing it the old way.

> There are two things I would like to get done here:
>
>  1)	Automatically construct an optimized fast-path dispatch table for
> 	all of the present and future accelerated software drawing
> 	functions.  With the huge number of variables in the rendering
> 	equation, it's really hard to tweak the existing dispatch code
> 	to fix a performance problem in a specific application.  Automating
> 	this construction would permit easy integration of new accelerated
> 	functions, which seems key to usable software performance.
> 
>  2)	Fix fbcompose to work on larger units than a single pixel.  By
> 	moving up to 8x8 'patches', we can actually get reasonable
> 	acceleration while still avoiding polynomial explosion of
> 	functions.

I had the same thought, but I think working on a whole line of the destination 
might give you better cache performance in most cases (ie. all cases where 
you don't have a transformation on the destination). We've used similar code 
on Qt 4's client side rendering code and achieved very good results with it 
(given, the number of combinations we support is less, but it should be 
generalizable to what render needs).

>  3)	Write glyph-level text compositing code instead of travelling
> 	through the general case code.
>
> These seem orthogonal to me...

Cheers,
Lars

[PS: I'll be on offline for a week from today]