[Liboil] Re: [patch] sse2 optimized sad8x8_u8_avg
will.dyson at gmail.com
Tue Jun 13 02:16:59 PDT 2006
On 6/13/06, Will Dyson <will.dyson at gmail.com> wrote:
> Here is an sse2 optimized version of sad8x8_u8_avg.
> On an Athlon64 it measures about 2.6 times the speed of the ref implementation.
> On a PentiumM laptop, it measures about 1.8 times the speed of the ref
Forgot to mention:
While doing this, I noticed that sse2 has a function "_mm_avg_epu8" to
compute u8 averages 16 at a time. I tried to use this instead of
padding the bit depth up to u16 and doing the add and divide routine.
Unfortunatly, _mm_avg_epu8 has rounding behavior that is incompatable
with the reference function.
More information about the Liboil