[Liboil] Re: [patch] sse2 optimized sad8x8_u8_avg

Will Dyson will.dyson at gmail.com
Tue Jun 13 02:16:59 PDT 2006


On 6/13/06, Will Dyson <will.dyson at gmail.com> wrote:
> Hi,
>
> Here is an sse2 optimized version of sad8x8_u8_avg.
>
> On an Athlon64 it measures about 2.6 times the speed of the ref implementation.
> On a PentiumM laptop, it measures about 1.8 times the speed of the ref
> implementation.

Forgot to mention:

While doing this, I noticed that sse2 has a function "_mm_avg_epu8" to
compute u8 averages 16 at a time. I tried to use this instead of
padding the bit depth up to u16 and doing the add and divide routine.
Unfortunatly, _mm_avg_epu8 has rounding behavior that is incompatable
with the reference function.

-- 
Will Dyson


More information about the Liboil mailing list