[Pixman] [PATCH] mmx: add src_8888_0565

Fri Apr 20 20:28:35 PDT 2012

On Fri, Apr 20, 2012 at 3:43 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> Uses the pmadd technique described in
>> http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
>> +static force_inline __m64
>> +pack_4xpacked565 (__m64 a, __m64 b)
>> +{
>> +    __m64 rb0 = _mm_and_si64 (a, MC (packed_565_rb));
>> +    __m64 rb1 = _mm_and_si64 (b, MC (packed_565_rb));
>> +
>> +    __m64 t0 = _mm_madd_pi16 (rb0, MC (565_pack_multiplier));
>> +    __m64 t1 = _mm_madd_pi16 (rb1, MC (565_pack_multiplier));
>> +
>> +    __m64 g0 = _mm_and_si64 (a, MC (packed_565_g));
>> +    __m64 g1 = _mm_and_si64 (b, MC (packed_565_g));
>> +
>> +    t0 = _mm_or_si64 (t0, g0);
>> +    t1 = _mm_or_si64 (t1, g1);
>> +
>> +    t0 = shift(t0, -5);
>> +    t1 = shift(t1, -5 + 16);
>> +
>> +    return _mm_shuffle_pi16 (_mm_or_si64 (t0, t1), _MM_SHUFFLE (3, 1, 2, 0));
>> +}
>
> I think the return statement can be simplified with a _mm_packs_pi32,
> but I couldn't get it to work. If someone has a chance to take a look,
> I'd be very appreciative.

I realized in talking with Søren on IRC that the code in the pdf
converts to 555, which allows packssdw to work. We'd need packusdw
here, but it wasn't added until SSE 4.1.

It looks like the ffmpeg 888 -> 565 MMX code unpacks the input in a
way that avoids needing to repack it at the end, but I don't think
that is an improvement over an extra shuffle at the end. I'll play
with it some and see.