Xvideo performance on Radeon 7500 vs Intel 915

Tue Dec 19 12:15:35 PST 2006

Ken Mandelberg wrote:
> Actually the "FBTexPercent" option worked! I lowered it from 50% to 
> 20% which left enough memory for xf86AllocateOffscreenLinear to 
> succeed. This code (and my diagnostics) are inside a "#ifdef 
> USE_XAA", so I'm pretty sure I'm using xaa.
Ah right. In certain low-memory situations this may indeed decrease the
memory reserved for textures.

> At any rate that problem seems solved.
> 
> The next problem is that even though I get just about get enough 
> performance to locally play HD mpeg's, there is not enough margin 
> left to handle the network overhead for streaming video (say from 
> mythbackend).
> 
> So back to performance. I presume that the reason the Intel 915 uses
>  almost no cpu time in Xorg is that the client is writing directly to
>  system ram shared with graphics chip, while in the Radeon case Xorg 
> has to copy the data to video ram.
Basically yes. Well the intel driver still has to copy the data to
shared graphics ram, but it doesn't take much time - those 70 or so MB/s
are a piece of cake for a modern cpu. The radeon driver OTOH copies it
to gart, and then the gpu will blit it to its local ram.

> Do I have that right, and if so is there an inherent advantage for HD
>  video in using shared video ram that the client can get to?
The client can't use shared video ram directly - the semantics of xv
don't allow that.
The radeon driver _could_ in theory avoid the copy to local ram and just
use the overlay from gart, but I guess it wouldn't work too well. As I
said already, it doesn't save you any bandwidth in the end anyway, since
you read the picture out from the buffer more often than you have to
copy it.
You could try to insert a usleep() into RADEONDisplayVideo() (just
before the first RADEONWaitForFifo(). Try something like usleep(1500) or
so, though the exact value doesn't really matter since scheduling time
slots are probably larger anyway. I've noticed this seems to decrease
xorg cpu time, since we're waiting there quite a long time until the
chip has finished copying data, so instead of busy-waiting if you're
lucky mplayer now gets the additional cpu time (just don't try it
without xv dma...). I thought about those WaitForFifo there btw, but
couldn't figure out how to avoid, while we could convert the code to use
the cp we'd still need to wait for the RADEON_REG_LD_CTL_LOCK_READBACK
so it wouldn't actually help :-(.
Even if this nasty hack works, it won't do wonders however.

Roland