EXA for radeon experimental patch

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed Aug 31 15:41:23 PDT 2005

> Another think I saw is that compositing onto the framebuffer is still always 
> slow. It might be a good idea is EXA always used DownloadFromScreen (if it 
> exists) to copy all pixmaps for a composite call into main memory before 
> attempting to use fbComposite. 

DownloadFromScreen will be dead slow in many cases. Especially you can't
really rely on DMA to AGP memory here as a lot of chipsets have non
working write from GPU to AGP :(

> I know this would give a huge speedup in some cases. Especially compositing 
> onto the framebuffer is currently extremely slow as it can't be migrated over 
> to main memory. Using DownloadFromScreen to make a copy of the framebuffer 
> area in question (and of the other two operands to composite), doing the 
> composition completely in main memory and then copying the result back into 
> the framebuffer would probably be a factor of 10-50 faster than doing calling 
> fbComposite with something still left in video mem.

We might need to "hint" EXA about how good DownloadFromScreen is ?

> Now this is not true for shared memory architectures as the i810, so we would 
> probably need some way to find out how slow framebuffer reads are (and how 
> fast DownloadFromScreen is) and decide the strategy to use based on this 
> information.

BTW. Another issue I'm tackling at the moment is endianness & swappers.
When falling back, composite will end up drawing directly into pixmaps
in vram which have a different bit depth than the front buffer.

This will of course not work on big endian machines as the swapper on
the PCI -> VRAM path will be configured for the front buffer.

I'm about to add to EXA a couple of new hooks PrepareAccess() &
FinishAccess() that will wrap such direct accesses to vram. They can be
stacked though, up to 3 times for composite. On Radeon, that is fine as
I can use the surface registers to setup different swapper settings over
the 3 pixmaps, but not all cards can do that. So I'll have a fallback
mecanism: when PrepareAccess() fails, then the pixmap is downloaded
using DownloadFromScreen() and compositing will be done from memory.
DownloadFromScreen() should always work as it can save the main swapper
setting, change it to the pixmap bit depth, do the transfer, restore the

What kind of mecanism does nvidia have for dealing with that issue ?


More information about the xorg mailing list