EXA for radeon experimental patch

Wed Aug 31 10:09:20 PDT 2005

On Wednesday 31 August 2005 18:48, Thomas Winischhofer wrote:
> Lars Knoll wrote:
> > On Tuesday 30 August 2005 12:37, Eric Anholt wrote:
> > [snip]
> >
> >>As a side note: The lack of render acceleration on my r300 has exposed
> >>the fact that the migration heuristics aren't working well in the
> >>absence of Render acceleration.  Anyone have a suggestion why that would
> >>be?
> >
> > One idea is that the Software uses both Composite calls and regular
> > blits/solid fills on the pixmap. I know at least Qt does this. Current
> > versions try to avoid using Render when possible, as it usually is a lot
> > slower.
> >
> > So what you might see is that both commands happen and the pixmap gets
> > migrated back on forth all the time. Also a missing DownloadFromScreen
> > implementation makes moving pixmaps into main memory rather slow.
> >
> > Another think I saw is that compositing onto the framebuffer is still
> > always slow. It might be a good idea is EXA always used
> > DownloadFromScreen (if it exists) to copy all pixmaps for a composite
> > call into main memory before attempting to use fbComposite.
> >
> > I know this would give a huge speedup in some cases. Especially
> > compositing onto the framebuffer is currently extremely slow as it can't
> > be migrated over to main memory. Using DownloadFromScreen to make a copy
> > of the framebuffer area in question (and of the other two operands to
> > composite), doing the composition completely in main memory and then
> > copying the result back into the framebuffer would probably be a factor
> > of 10-50 faster than doing calling fbComposite with something still left
> > in video mem.
> >
> > Now this is not true for shared memory architectures as the i810, so we
> > would probably need some way to find out how slow framebuffer reads are
> > (and how fast DownloadFromScreen is) and decide the strategy to use based
> > on this information.
>
> In my non-DMA UMA hardware (sis), upload with sse peaks at about
> 540MB/sec, while download peaks at about 50MB/sec (naturally regardless
> whether SSE or MMX or whatever).

That's to be expected, as the memory is mapped write-combining.

That's quite a bit better than what I get on my GeForce 6600. non-DMA I get 
2.5 MB/s download speed. Using DMA transfers through the GART this goes up to 
about 80 MB/s. I can live with 80 MB/s (which can give us about 5-10 full 
screen blend operations per second), but 2.5 is unusable :)

Cheers,
Lars