# [Pixman] [RFC] mmx: add and use expand_4xpacked565

Matt Turner mattst88 at gmail.com
Fri May 18 07:23:58 PDT 2012

```On Thu, May 17, 2012 at 5:40 PM, Søren Sandmann <sandmann at cs.au.dk> wrote:
> Søren Sandmann <sandmann at cs.au.dk> writes:
>
>>> Given a pixel with only the red component of these values, the results
>>> are off-by-one.
>>>
>>> 0x03 -> 0x19 (0x18)
>>> 0x07 -> 0x3A (0x39)
>>> 0x18 -> 0xC5 (0xC6)
>>> 0x1C -> 0xE6 (0xE7)
>>>
>>> (Same for blue, and green has many more cases)
>>>
>>> It uses
>>> R8 = ( R5 * 527 + 23 ) >> 6;
>>> G8 = ( G6 * 259 + 33 ) >> 6;
>>> B8 = ( B5 * 527 + 23 ) >> 6;
>>>
>>> I don't guess there's a way to tweak this to produce the same results
>>> we get from expand565, is there?
>>
>> Maybe I'm missing something, but this certainly produces the correct
>> result:
>>
>>     r8 = (r5 * 8 + r5 / 4) = r5 * (8 + 0.25) = r5 * (32 + 1) / 4
>>        = (r5 * 33) >> 2
>
> I should maybe expand a bit more on this: Pixman uses bit replication
> when it goes from lower bit depths to higher ones. That is, a five bit
> value:
>
>        abcdef
>
> is expanded to
>
>        abcdefabc
>
> which corresponds to a left-shifting r5 by 3 and adding r5 right-shifted
> by 2. This is the computation that is turned into a multiplication and a
> shift in the formula above.
>
> A more correct way to expand would be
>
>        floor ((r5 / 31.0) * 255.0 + 0.5)
>
> but this is fairly expensive (although it can be done with integer
> arithmetic and without divisions, and may actually be equivalent to your
> formula -- I haven't checked).
>
>
> Søren

Right, okay. It seems that the algorithm I was using is the more
accurate but slower one. By switching to what pixman uses elsewhere I
can simplify this code significantly. I can confirm that implementing
r8 = (r5 * 33) >> 2 produces the correct results. I would imagine that
two shifts and an or per component would be better than two shifts and
a multiply. I'll send a new patch.

Thanks for the explanation.

Matt
```