xf86XVCopyPacked() and friends : why so slow ?

Stéphane Marchesin stephane.marchesin at gmail.com
Thu Jul 21 19:27:46 PDT 2011


On Thu, Jul 21, 2011 at 14:15, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Feb 4, 2010 at 7:45 PM,  <rixed at happyleptic.org> wrote:
>> When playing some video with mplayer I noticed with oprofile that
>> half the time is spent in xf86XVCopyPacked() or xf86XVCopyYUV12ToPacked().
>>
>> Looking at the former, I wonder why a mere memcpy was not used instead
>> of "manually" copying each words. glibc's memcpy is usually optimized
>> for the target architecture while there is little the compiler can do
>> to optimize given code.
>> Also, for the plannar to packed version, you can achieve much better
>> performance using vector instructions, but it's less easy to do it
>> portably.
>>
>> So I suppose there is a good reason why these functions are so slow.
>> Maybe because the video driver are supposed to propose better ones ?
>> Or maybe because it's planned to use an external library like pixman
>> to do this kind of job in the future ?
>>
>> More to the point, what I'm trying to know is weither I'm supposed to
>> optimize my video driver to not use these functions, or if it's OK to
>> optimize them instead, and what path I should follow ?
>
> I was digging through some old patches and came across a
> Loongson-optimized xf86XVCopyYUV12ToPacked function (attached). Do you
> know who wrote it?
>
> Did we ever come to some conclusion as to how this was supposed to be
> handled? Would optimized implementations be acceptable to put in
> hw/xfree86/common/xf86xv.c?
>
> Also, I see no reason why xf86XVCopyPacked can't be simplified by
> using memcpy (or maybe memmove?). Any reason why not?
>

I suspect that on some combinations of arches+drivers we'll be copying
to I/O area. I suppose we could have our own memset_toio functions and
handle that.

Stéphane


More information about the xorg-devel mailing list