EXA for radeon experimental patch

Wed Aug 31 12:00:36 PDT 2005

On Wednesday 31 August 2005 19:31, Eric Anholt wrote:
> On Wed, 2005-08-31 at 09:52 +0200, Lars Knoll wrote:
> > On Tuesday 30 August 2005 12:37, Eric Anholt wrote:
> > [snip]
> >
> > > As a side note: The lack of render acceleration on my r300 has exposed
> > > the fact that the migration heuristics aren't working well in the
> > > absence of Render acceleration.  Anyone have a suggestion why that
> > > would be?
> >
> > One idea is that the Software uses both Composite calls and regular
> > blits/solid fills on the pixmap. I know at least Qt does this. Current
> > versions try to avoid using Render when possible, as it usually is a lot
> > slower.
> >
> > So what you might see is that both commands happen and the pixmap gets
> > migrated back on forth all the time. Also a missing DownloadFromScreen
> > implementation makes moving pixmaps into main memory rather slow.
>
> Since exapict.c will break Composite down into a core operation when
> possible, this could make sense.  However, the migration stuff was
> designed with that latency built in to migration to hopefully avoid
> this.  The exception would be if the dirty stuff was hooked up, where
> you could be uploading, doing a render, then throwing away (rather than
> downloading) the framebuffer copy when a fallback happened.
>
> > Another think I saw is that compositing onto the framebuffer is still
> > always slow. It might be a good idea is EXA always used
> > DownloadFromScreen (if it exists) to copy all pixmaps for a composite
> > call into main memory before attempting to use fbComposite.
>
> We would have to allocate and hook up an area in system memory for the
> visible screen then.  And, given that I expect people to use compositing
> managers (which will be doing a back-buffer pixmap that could migrate
> normally anyway), I'm not too interested in it.

I'n not taking about migrating the whole framebuffer. Usually the composite 
operation is restricted to a window that quite a bit smaller, and all you 
need is a temporary pixmap in main memory that contains a copy of the window. 
DownloadFromScreen makes exactly this cheap.

> One improvement that could be useful would be to have the migration
> weighted differently for different operations, based on approximate
> costs of doing the operation either in framebuffer or system memory.  I
> bet it wouldn't be too hard to make some much better guesses of where
> pixmaps should live.

Point is that with 2.5MB read speed from the framebuffer migrating (or doing 
temporary copies) for the operands of every fbComposite operation makes a 
whole lot of sense if DownloadFromScreen gives you 80MB/s or more.

With all the options you have for the composite call, you will always have 
some operations that are not HW accelerated and cause a fallback to the fb 
code. Everytime you do this and the pixmap is not moved to main memory you 
pay an extremely heavy price for it. 

Think about a 1000x1000 pixmap that you use composite operations on. 99 out of 
100 of these ops are HW accelerated, 1 runs in software (take a radial 
gradient as an example). Currently the pixmap will most probably stay in 
video mem for this one operation. But with the read speed we get from video 
mem this one operation will take about 2 secs if we don't use DMA transfers 
to create a temporary copy in main mem before we do the operation. Using a 
scheme where you use DMA to download the pixmap to main memory apply the 
operation and copy it back will cut the time down to less than 100 ms.

Lars