[Mesa-dev] Status update of XvMC on R600

Christian König deathsimple at vodafone.de
Wed Nov 10 14:12:07 PST 2010


Am Mittwoch, den 10.11.2010, 15:30 -0500 schrieb Younes Manton:
> Keep in mind that the XvMC interface makes no guarantees about the
> order in which macroblocks are submitted. Also, we're already sorting
> macroblocks by I/P/B type in order to batch draw calls. Otherwise it
> would be possible to sort on x,y coords and use a sort that performs
> well on sorted/nearly sorted inputs to take advantage of the fact that
> most clients always submit macroblocks in the obvious order. At the
> time I chose to sort on type to batch draw calls rather and came up
> with the zero-block scheme to cut down on per frame data that needs to
> be generated. You could use index buffers, but somehow I don't think
> that would be a win, especially on HW that doesn't actually do index
> lookup in HW. If you have any better ideas they'd be welcome.

I switched the y, cb and cr textures from a 2D texture to a 3D texture,
set wrap_r to PIPE_TEX_WRAP_CLAMP_TO_BORDER, the border color to black
and then put only the z-coordinate into the vertex buffer instead of the
whole pixel position. Using a z-coordinate of 0.0f fetches the normal
sample, while a z-coordinate of -1.0f fetches just black. 

This got the cpu time in gen_block_verts from 49.7% down to 39.3%.

So we can avoid to transmit the empty blocks all together without the
burden of a conditional texture fetch. We could event extend this idea
to just memcpy the y, cb and cr buffers to the texture buffer without
fiddling around with the texture pitch and macroblock position.

What do you think of this idea? Are 3D textures common enough to use
them?

> In the meantime, I suggest you check if your vertex buffers are in
> sytem memory (preferably at least WC-ed if not cached); I don't recall
> spending that much time in gen_block_verts in Nouveau.
I'm using an quite old Athlon 64 X2 Dual Core CPU in my test machine,
running with an 64bit ubuntu, and 32bit floating point calculation are
f... slow on this system.

I have noticed this before and assume that the CPU needs to switch
between 64bit and 32bit mode for this (there is a special opcode prefix
for doing this if I remember the x64 specs correctly). I can't tell for
sure if this would happen to other system also (ok i could get my 32bit
machine running again), but I would strongly suggest to avoid any 32bit
floating point calculation on this CPU.

Christian.



More information about the mesa-dev mailing list