[Pixman] FPU-based implementation of the core pixel pipeline

Tue Oct 5 14:39:13 PDT 2010

> If you are using lookup tables to convert floating point to integers, I have
> found that you can use the lower bits to linearly interpolate a much smaller
> number of entries.
>
> You can also eliminate all the negative numbers and all numbers greater than
> 1 and all NAN, making the table 1/4 size.

Actually, there's a technique using "magic constant multiply-add"
which leaves the bits we want directly in the bit representation in a
predictable place.  It should reduce to FMADDS, FSTS, LWZ, RLWINM/MI
per component, on PowerPC - the bottleneck would then be the
load-store units if that's the only thing going on.  At the moment
it's FMADDS, CFTOIZ, FSTS, LWZ, RLWINM/MI, so it saves a whole FPU
instruction that may be poorly optimised.

The existing table is for int-to-float conversion.  For 8bpc, that's
LBZ, FLDS per component, for others substitute an RLWINM for LBZ.  The
only trouble is that the table is unconditionally precomputed for rare
16bpc conversions (etc) as well as the common smaller ones, and thus
consumes significant memory.

 - Jonathan