[Linux-fbdev-devel] Re: radeon, apertures & memory mapping

Sun Mar 13 17:47:08 PST 2005

On Mon, 14 Mar 2005 12:05:59 +1100, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> 
> > It should be the responsibility of the memory manager. If anything wants
> > to access the memory it would call lock() and when it's done with the
> > memory it calls unlock(). That's exactly how DirectFB's memory manager
> > works.
> 
> In an ideal world ... However, since we are planning to move the memory
> manager to the kernel, that would mean a kernel access (syscall, ioctl,
> whatever...) twice per access to AGP memory. Not realistic.

I'm only suggesting this for the DRM/fbdev stack. Anything else from
user space can use a non-cached mapping.

It shouldn't hurt to have a parallel non-cached mapping being used in
conjuction with this protocol. By definition the non-cached mapping
never gets into an inconsistent state.

> 
> The case of the CP ring is easy to deal with by the macros we have there
> already and it would be kernel-kernel. But it would be a hit for a lot
> of other things I suppose.

The performance trade off is, how long does the invalidate take?  If
the CPU has 2MB of unflushed write data the instruction is going to
take a while to finish. In the non-cached scheme this data is flushed
in parallel with us playing with the AGP memory.  To flush 2MB takes
something like 2MB / 400Mhz * 64bytes * 2 (DDR) = 20 microseconds but
it may be more like 1 microsecond on average.

Thinking about this for a while you can't compute which is the better
strategy because everything depends on the workload and how dirty the
cache is. Best thing to do would be to code it up and try it. But I
want to get a dual head radeon driver working first.

It may also be true that the CP Ring is better left non-cached and
only access to the graphics buffers be done with the caching scheme.

BTW, you can implement super fast texture load/unload using a similar
scheme. Start with the texture in the user space program. Program
wants to upload the texture. Flush CPU cache. Point the GART at the
physical pages allocated to the user holding the texture. Now walk the
user's page table and mark those pages copy on write. Free the memory
the pages the GART was originally pointing at. Reverse the scheme to
get data from the GPU. For small textures it is faster to copy them
but if you are moving 20MB of data this is much faster.

-- 
Jon Smirl
jonsmirl at gmail.com