EXA pixmap alignments.

Tue Sep 27 15:43:21 PDT 2005

> Is there an EXA driver that implements system-to-framebuffer or
> framebuffer-to-system memory copy using DMA today? 

Well, I was about to do it for Radeon and I'm wondering wether I'll face
such issues :) The radeon scatter/gather DMA may not have the "low 2
bits" limitation, at least the r200 spec doesn't mention it, so... I'm
adding some support in the kernel DRM for it.

Currently, we accelerate system->fb using host data blits from AGP. That
is, we copy the pixmap to AGP (we use an indirect buffer) and do an host
data blit from there to the fb. That mean an additional copy, though it
allows us to workaround any pitch/alignement issue. It's still a bit
annoying since that copy to AGP is done to non-cacheable memory which
isn't terribly fast. I'll try to bench that vs. the pure SG DMA approach
once I have something implemented.

Another issue is that r300 has a "bug": it doesn't support endian
swapping on host data blit. Thus, we have to also do endian swapping on
big endian architectures when copying to AGP.

I've been pondering using the 3D engine instead to do the AGP -> fb
blit, since it seems it has an endian swapper too that might work, but I
haven't had time to play with it yet.

We don't accelerate fb->system on radeon yet. I think Lars Knoll's "nv"
driver does however. I think he does it by blitting to AGP, though,
which will have to face the issue of AGP chipsets that don't support
writes from the card to AGP (like mine :( ...

It's difficult at this point to figure out what is faster... a PCI
scatter/gather DMA blit, or an AGP one with an intermediate copy to/from
AGP. The first one has the advantage of beeing a single operation, but
has a bit of overhead for setting up the SG list etc... (especially on
64 bits machines with an iommu or worse, on x86_64 without iommu which
may end up using bounce buffers). The second one (AGP) has the advantage
of having a faster transfer path to AGP memory, but needs the additional
copy (which can be much slower than a normal memory copy due to the
non-cacheablility of AGP).

Finally, PCIe GART and PCI GART has completely different
characteristics. We currently don't have any infrastructure for doing
that yet, but it would be possible to just map/unmap the system memory
into card space to do the blits, which would be the best/fastest way.
However, that also means that we'll have the alignment/pitch constraints
on the system memory pixmap.

So overall, I agree, it would be nice to have a way to impose some
constraints to pixmap allocations in X core.

> Furthermore, Does EXA consider this a pipelined operation requiring an
> explicit sync or is the operation expected to have finished when the
> driver function returns?

I think the later but it's unclear when looking at the EXA code...

Ben.