[Mesa-dev] Status update of XvMC on R600

Sat Nov 13 09:48:26 PST 2010

Am Freitag, den 12.11.2010, 10:11 -0500 schrieb Younes Manton:
> 2010/11/12 Christian König <deathsimple at vodafone.de>:
> > What I need for both the ycrcb texture and vertex uploads is a buffer in
> > system memory, where the cpu access is fast and a function to tell the
> > gpu to upload this buffer to vram, so the cpu doesn't need to pump the
> > data over the system bus, wait for an "in use" buffer to get idle etc.
> 
> This is what PIPE_TRANSFER_DISCARD and PIPE_USAGE_DYNAMIC or
> PIPE_USAGE_STREAM were supposed to accomplish. DYNAMIC/STREAM for
> placing the texture in GPU-accessible system memory and DISCARD for
> allowing the driver to allocate and return new buffers if the one's
> being accessed are busy. Once upon a time they did that in nvfx, but I
> don't think it's been reimplemented since Nouveau went to TTM and
> gallium went to transfer objects.
I already thought that this was the original intention of the flags. We
should get it working again or move on to a better interface (transfer
objects).

> > What I need for the surface and ycrcb intermediate textures is a buffer
> > directly allocated in vram and never touched by the cpu.
> 
> And this was the default placement for textures that were marked static.
> 
> > So we can't just differentiate by vertex/texture buffer, but need to
> > look at the usage flags of those.
> 
> From what I recall both helped a lot, especially for large videos
> where the CPU had a lot of data to generate per frame and buffers
> would be busy for longer. I haven't kept up with mesa lately so I
> don't know if there are new ways to deal with this, but I did run into
> both problems before on nvfx and there is value in dealing with them.
What I did today is optimizing the vertex generation, first I'm using
quads instead of triangles now, then I moved the empty block and frame
coded dct handling completely into the shaders. In the end we only need
to draw one quad for a macroblock instead of eight triangles.

So we now need to transmit 4*15=60 floats instead of 8*3*5=120 for a
macroblock, plus we now have everything ready for doing iDCT on the
buffers (ok nearly everything).

Christian.