render improvements
Lars Knoll
lars at trolltech.com
Tue Apr 19 00:23:18 PDT 2005
On Monday 18 April 2005 23:51, Keith Packard wrote:
> > > b) operating on scanlines in general gives us more power to use MMX to
> > > optimize the general case itself,
>
> scanlines don't deal with filters and transforms well at all; I'd like
> to see this code use square patches (8x8 or so) which seems like a good
> fit for both MMX and transforms.
There are a few reasons we used scanlines for the implementation.
The first one was that it's rather easy to implement.
The other is that we had some rather good experience with them using our
client side painter. Even for rather large images that don't fit into the L1
cache, general affine transformations were decently fast using this approach.
As long as you can use the processor cache, the time to fetch the scanline is
not too big compared to the time the composition takes.
It might also make an implementation easier where we use DMA tranfers to get
the pixmap data from the framebuffer into the processor cache, but I might be
wrong here.
We know that the biggest performance bottleneck currently is the framebuffer
access, so I think that's the place we should focus currently. Using MMX
instructions to fetch/store a scanline from the framebuffer is a good start,
but in the long term we need DMA if we want to get any reasonable
performance.
We can try a patch based aproach later on once we found a way to get fast
access to the framebuffer data. As long as this is not solved it IMO doesn't
make a whole lot of sense to try to improve the implementation in this
respect.
Cheers,
Lars
More information about the xorg
mailing list