EXA for radeon experimental patch

Thu Sep 1 02:35:49 PDT 2005

On Thursday 01 September 2005 11:06, Benjamin Herrenschmidt wrote:
> On Thu, 2005-09-01 at 09:16 +0200, Lars Knoll wrote:
> > On Thursday 01 September 2005 00:41, Benjamin Herrenschmidt wrote:
> > > > Another think I saw is that compositing onto the framebuffer is still
> > > > always slow. It might be a good idea is EXA always used
> > > > DownloadFromScreen (if it exists) to copy all pixmaps for a composite
> > > > call into main memory before attempting to use fbComposite.
> > >
> > > DownloadFromScreen will be dead slow in many cases. Especially you
> > > can't really rely on DMA to AGP memory here as a lot of chipsets have
> > > non working write from GPU to AGP :(
> >
> > What about writes from GPU to PCI? Maybe these exist.
>
> They do, though they require some kernel support to get to the physical
> address of pages and need some proper scatter/gather support on the card
> side.

The all you need is a drm module for your card. Even if the card doesn't have 
scatter/gather support, drm allows you to allocate a piece of consistent 
physical ram, and mmap it in the server. The handle you get is the physical 
address, so you should be able to use that to implement PCI dma transfers.

> > If you can't provide an implementation that is significantly faster than
> > just a series of memcpy commands it's probably best just to not implement
> > the hook, as it won't do anything else than the fallback handling from
> > EXA.
>
> Nah, it has to play with the swappers on BE architectures.
>
> > Whether the hook is implemented or not?
>
> It has to be implemented anyway for big endian because of the swapper
> problem.
>
> > > > Now this is not true for shared memory architectures as the i810, so
> > > > we would probably need some way to find out how slow framebuffer
> > > > reads are (and how fast DownloadFromScreen is) and decide the
> > > > strategy to use based on this information.
> > >
> > > BTW. Another issue I'm tackling at the moment is endianness & swappers.
> > > When falling back, composite will end up drawing directly into pixmaps
> > > in vram which have a different bit depth than the front buffer.
> >
> > As long as the Picture has the correct format (ie. the one that is in
> > fact in VRAM) it should all just work.
>
> Nope. When writing/reading vram via MMIO, you go through the swappers on
> the PCI->VRAM path which are set for the bpp of the front buffer. If you
> picture you are up/downloading has a different bpp, you need to change
> the swapper during the access. Same problem with the fb* fallbacks in
> EXA, which is why I'm adding the Prepare/Finish hooks.

Hmmm... but fbComposite can access up to three pixmaps, all of which can have 
different bit depth. I don't really see how you could make that work (as you 
need different swapping behavior for all three pixmaps. 

> I don't think XAA ever accessed the VRAM with a different bpp than the
> front buffer.
>
> > How did this work in XAA? XAA did also fall back to fbComposite operating
> > directly on VRAM. The only change now is that you're you have more
> > freedom of what kind of format you store in VRAM.
> >
> > > What kind of mecanism does nvidia have for dealing with that issue ?
> >
> > Nvidia has an endianness flag you can set in various places which tells
> > the HW about the endianness of pixmaps etc. They are allways set to host
> > endianness.
>
> What about the PCI->vram front swappers ?

No idea. I am no expert here. As far as I can see in the code there is some 
byteswapping happening in some places, but not in others. So the exa code I 
have might very well be broken on big-endian machines (though I can't test).

Lars