radeon, apertures & memory mapping

Sun Mar 13 15:00:01 PST 2005

On Mon, 14 Mar 2005 09:20:01 +1100, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> On Sun, 2005-03-13 at 17:10 -0500, Jon Smirl wrote:
> > On Mon, 14 Mar 2005 08:49:13 +1100, Benjamin Herrenschmidt
> > <benh at kernel.crashing.org> wrote:
> > > > If you are doing fallback calculations in a 6MB buffer that is 1,500
> > > > pages. Accessing all of this effectively flushes the data cache. Once
> > > > you are done with it you probably don't want those pages in the cache
> > > > anyway.
> > >
> > > I wouldn't count on it flushing anything
> >
> > I meant flushes out everything except the 1,500 pages you just
> > accessed. Since you don't want those pages any more a total cache
> > flush shouldn't make a difference, you don't want any of these pages
> > in the cache anyway.
> 
> I wouldn't count on it again. Not all caches have a strict PLRU
> algorithm, some caches do random replacement (or a mix of those), some
> CPUs do agressive speculative loads and may bring back stuffs in the
> cache just for fun, etc ....

I'm not being clear....

Leave AGP memory as normal RAM
driver does it thing to the memory
driver executes flush of data cache on CPU
after flush tell GPU to access the data

The performance hit of executing the flush is probably negligible
since you probably didn't care about anything in the data cache. All
of those entries would be replaced by later code anyway. You will lose
some later overlap parallelism as the cache is refilled.

> 
> Though the flushes may be fast if there is no actual hit in the cache, I
> agree. Again, that should be benched.
> 
> In fact, i would _love_ to be able to mark AGP memory as cacheable on
> ppc, even if there is no performance benefit in the end. The issue is
> that currently, we end up having both a cacheable and a non-cacheable
> mapping for those pages (the kernel linear mapping still maps those
> pages cacheable, and it's almost impossible to get rid of that unless
> you are prepared to disable the large pages mapping of kernel space or
> the BATs on ppc32, which would harm kernel performances significantly).
> 
> It works, but it's illegal. That means that the CPU might well speculate
> a load from one of these pages in kernel-land just because it happens to
> be next to a page where you are iterating an array, and may then bring a
> bit in the cache from that page.

That shouldn't matter the page brought in would be for a speculative
read and never accessed. It should just fall out of the cache and not
be written back. There is only one cachable mapping. In this model
writes are always followed by a flush before telling the GPU to access
the memory that has just been written.

> 
> At that point, a non-cacheable access from userland to that same line
> that was brought to the cache may lead to undefined behaviour, ranging
> from just works, to checkstops the CPU, with cases of writing corrupted
> data, etc... depending on the CPU.
> 
> I yet have to see the problem happening in practice, but we are
> definitely not on the safe side currently. I suspect ppc32 in practice
> won't hit it, but ppc64 will...
> 
> Ben.
> 
> 

-- 
Jon Smirl
jonsmirl at gmail.com