[Mesa-dev] Status update of XvMC on R600

Fri Nov 12 03:48:30 PST 2010

On Thu, 2010-11-11 at 14:59 -0800, Jerome Glisse wrote:
> 2010/11/11 Keith Whitwell <keithw at vmware.com>:
> > There is still more to do there.  Currently r600g treats buffer and texture uploads separately, and I've only attempted to improve texture uploads.  Buffer is just as important however.
> >
> > The change needed is likely to be one of two:
> > a) Allow newly created vertex buffers to be in the GTT domain, where they can be mapped cached.
> > b) Provide a staging resource upload path (with the staging buffer in GTT domain).
> >
> > The latter will catch more cases and doesn't suffer from waits for the engine to go idle when accessing an in-use buffer.  The former is probably fastest for the cases where it works.
> >
> > Right now staged texture uploads use a 3d blit to copy from the staging resource to the final destination.  That probably won't work (directly at least) for buffer uploads as buffer dimensions (eg 64k by 1) mean they usually can't be bound as render targets.  So we need to jump through some hoops to get a hardware upload path in the absence of a DMA engine or 1d-blit.
> >
> > Keith
> 
> I am not sure on how gallium texture upload was ever supposed to be or
> done, but from memory management point of view the idea i had was to
> create all bo in GTT and let migrate them to VRAM once they are use,
> eliminating any need for staging buffer. So it would be allocate bo,
> memcpy to bo the content of the texture, use bo and set it as vram bo
> so kernel migrate it to vram, that way you take advantage of kernel bo
> move which should be faster than any blit helped move.

That works great for normal/static textures that are written at most
once by the CPU and from then on always used by the GPU, and is
basically the (a) path, above.

The purpose of an intermediate/staging/dma-based upload path is to cope
with textures/buffers/etc which receive incremental updates from the CPU
at concurrently with being rendered from by the GPU.  

This is actually pretty common for VBOs, where a lot of applications
come up with schemes for incrementally updating a small number of large
VBOs (I think ETQW did this for instance), but also any application
using TexSubImage, etc, is effectively doing this.

Doing these updates with DMAs means we don't have to wait for buffer
idle before the update, which seems to be the most obvious current
bottleneck in r600g for a lot of apps.

> Anyway this was my initial thinking when doing the code.

It's definitely the most efficient path for static textures, but for
dynamically-updated resources, and for readbacks, having a GPU-mediated
copy seems to be a win.

Keith