RFH and status of XvMC on r600g

Christian König deathsimple at vodafone.de
Sat Jan 8 07:39:21 PST 2011


in the past couple of weeks i tried to optimize the shaders used for the
iDCT and MC code. Beside optimizing the TGSI code for the shaders i
optimized the TGSI->R600 code generation in r600g quite a bit:
        * Removed the temporary register use from most instructions
        * Optimize away CF_INST_POP
        * Use special constants for 0, 1, -1, 1.0f, 0.5f etc
        * Implement output modifiers and use them to further optimize
        * Fixed TEX and VTX joining
        * Optimize away CF ALU instructions even if type doesn't match
        * Fix alu slot assignment
        * Reworked and fixed bank swizzle code
        * Implement replacing gpr with pv and ps
        * Merging of alu slots into larger groups
        * Reworked literal handling
        * Implement register remapping
        * Optimized away unneeded alu moves
        * Rearanging and merging of export instructions
        * Fully implemented barrier handling

The end result still looks valid and gives a nice 25% speed increase for
a 720x480p videos (probably a bit more because the the bottleneck is
definitely the CPU now), but for 1280x1080i and 1920x1080i the increase
is only around 7% and 5% with the cpu still quite idle.

I assume that the bottleneck for the higher resolutions is the memory
bandwidth caused by the access patterns the iDCT and MC code uses. I
tried to enable tilling, but wasn't successfully so far, all i got when
setting R600_FORCE_TILING is:

Failed to allocate :
   size      : 0 bytes
   alignment : 0 bytes

I updated the kernel and merged my branch with master on a regular
basis, but still getting the same error.

So what i'm missing? Do i need to update some other component, like
libdrm for example? Is there any way to debug the memory bandwith usage
of the GPU?

I'm currently a bit frustrated, because it looks like I'm stuck and
can't improve the speed further. Any help would be very welcome.


