Xvideo performance on Radeon 7500 vs Intel 915
Roland Scheidegger
rscheidegger_lists at hispeed.ch
Mon Nov 27 12:21:45 PST 2006
Thomas Hellström wrote:
>>> Also, I tried moving down to 16 bit depth from 24 and didn't see a
>>> difference.
>>>
>> This all makes sense, as the bottleneck is probably the transfer from
>> system RAM to video RAM. Integrated chipsets actually have an advantage
>> there.
>>
>>
> I agree.
I'm not sure this is actually that much of an advantage, you may not
need to copy the data, however you need to read it out from main memory
multiple times negating any bandwidth advantage (assuming 24 fps video
and 72 fps display refresh the chip's scaler needs to read it three
times - at least if the chip actually uses a "true" overlay scaler and
doesn't just do some sort of a blit). (Note that the radeon chips should
probably be able to do video overlays from main memory too, though I
haven't tried that.)
Interestingly, I've tried both with dma xv and without on my good old
celeron 1.0A overclocked to 1.33Ghz (with sdram) - xorg cpu time seemed
to be just the same, I'm assuming it just burns up cpu cycles in some
wait for idle loop in case of dma.
Anyway, the driver really should support planar yuv natively instead of
converting to packed yuv. Not only would it be faster, it would also
need less memory (as the common planar yuv format has the cr and cb
subsampled both vertically and horizontally while packed yuv has it only
subsampled horizontally).
With the attached patch, xorg cpu time seems to be significantly lower
(roughly half here) - still not good enough for full mpeg4 hd video on
that old box, however :-). Note though the patch is old (against
monolithic xorg no less!) so it won't apply cleanly. Also, it is quite
broken and needs fixing before it could be commited (it allocates too
much ram, may not work for big endian, and worst the offset calculations
are wrong when the overlay window is moved out the screen - resulting in
garbled video, corrupted pixmaps and segfaults... There are likely
issues with source videos not aligned to 32 pixels too. It might be
worth fixing though.
I'm not sure, maybe increasing agp mode to 4 (if not already done) could
help performance too (if dma for xv is used).
New player (decoding library) version could help too. In any case, if it
needs 60% cpu time on a pentium m 2ghz, I would expect cpu time to get
awfully close to 100% on a 2ghz p4, even under optimal conditions.
Roland
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: radeon_video_planar2.diff
URL: <http://lists.x.org/archives/xorg/attachments/20061127/537fdc09/attachment.ksh>
More information about the xorg
mailing list