EXA pixmap alignments.

Lars Knoll lars at trolltech.com
Tue Sep 27 23:32:37 PDT 2005

On Wednesday 28 September 2005 00:43, Benjamin Herrenschmidt wrote:
> > Is there an EXA driver that implements system-to-framebuffer or
> > framebuffer-to-system memory copy using DMA today?
> Well, I was about to do it for Radeon and I'm wondering wether I'll face
> such issues :) The radeon scatter/gather DMA may not have the "low 2
> bits" limitation, at least the r200 spec doesn't mention it, so... I'm
> adding some support in the kernel DRM for it.
> Currently, we accelerate system->fb using host data blits from AGP. That
> is, we copy the pixmap to AGP (we use an indirect buffer) and do an host
> data blit from there to the fb. That mean an additional copy, though it
> allows us to workaround any pitch/alignement issue. It's still a bit
> annoying since that copy to AGP is done to non-cacheable memory which
> isn't terribly fast. I'll try to bench that vs. the pure SG DMA approach
> once I have something implemented.
> Another issue is that r300 has a "bug": it doesn't support endian
> swapping on host data blit. Thus, we have to also do endian swapping on
> big endian architectures when copying to AGP.
> I've been pondering using the 3D engine instead to do the AGP -> fb
> blit, since it seems it has an endian swapper too that might work, but I
> haven't had time to play with it yet.
> We don't accelerate fb->system on radeon yet. I think Lars Knoll's "nv"
> driver does however. I think he does it by blitting to AGP, though,
> which will have to face the issue of AGP chipsets that don't support
> writes from the card to AGP (like mine :( ...

Yes, that's what I currently do on nv. I also have an idea how to do 
fb->system using PCI scatter/gather, but didn't have the time to try it out 
up to now. Btw, is there any way to know if an AGP chipset supports writes 
from card to AGP?

> It's difficult at this point to figure out what is faster... a PCI
> scatter/gather DMA blit, or an AGP one with an intermediate copy to/from
> AGP. The first one has the advantage of beeing a single operation, but
> has a bit of overhead for setting up the SG list etc... (especially on
> 64 bits machines with an iommu or worse, on x86_64 without iommu which
> may end up using bounce buffers). The second one (AGP) has the advantage
> of having a faster transfer path to AGP memory, but needs the additional
> copy (which can be much slower than a normal memory copy due to the
> non-cacheablility of AGP).

Difficult to say without doing some measurements. The fastest way however 
would be to temporarily map the destination of the pixmap into the GART, then 
initiate the transfer from the framebuffer and afterwards remove the chunk of 
memory from the GART again. But that would require some kernel support from 
either agpgart or drm.

> Finally, PCIe GART and PCI GART has completely different
> characteristics. We currently don't have any infrastructure for doing
> that yet, but it would be possible to just map/unmap the system memory
> into card space to do the blits, which would be the best/fastest way.
> However, that also means that we'll have the alignment/pitch constraints
> on the system memory pixmap.
> So overall, I agree, it would be nice to have a way to impose some
> constraints to pixmap allocations in X core.

I'd think so too.

> > Furthermore, Does EXA consider this a pipelined operation requiring an
> > explicit sync or is the operation expected to have finished when the
> > driver function returns?
> I think the later but it's unclear when looking at the EXA code...

Not sure what EXA expects currently, but it won't help a lot to make this a 
pipelined operation, as the result of a fb->system transfer is usually needed 
directly afterwards anyway. I just implemented it synchronously for nv.


More information about the xorg mailing list