[Pixman] [PATCH 0/2] 7-bit bilinear interpolation precision

Sat Jun 30 14:25:37 PDT 2012

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Tue, Jun 26, 2012 at 6:13 AM, Jeff Muizelaar <jmuizelaar at mozilla.com> wrote:
>>
>> On 2012-06-25, at 7:44 PM, Siarhei Siamashka wrote:
>>
>>> These are the test patches for switching to 7-bit bilinear
>>> interpolation precisions. The first patch makes bilinear precision
>>> configurable. The second patch tweaks SSE2 bilinear scaler for better
>>> performance using PMADDWD instruction. Both should be applied after:
>>>    http://lists.freedesktop.org/archives/pixman/2012-June/002074.html
>>
>> We recently switched to using the same precision as Skia on Android. (4 bits for RGBA32 and 2 bits for RGB565)
>
> This is a good point. If using just 4 or even 2 bits of interpolation
> precision is acceptable for Skia, then maybe the current bilinear
> interpolation precision is really excessive in pixman. It would be too
> generous to give a handicap to Skia :)
>
> So should we also try 4-bit interpolation? Compared to 7-bit or 8-bit
> interpolation, it has an advantage that all the calculations need only
> 16-bit unsigned variables. This is faster for at least C, x86 SSE2 and
> ARM NEON code.

Four bits is also the precision that GdkPixbf has used for many years
with no complaints from users that I'm aware of. If you zoom more than
16x with GdkPixbuf, banding artefacts definitely start showing up, but
zooming that far is unlikely to look good with bilinear filtering
anyway. So dropping to four bits doesn't sound too bad to me.

That said, I'm a little concerned that nobody is trying separable
scaling instead of worrying about these microoptimizations. As far as I
know Skia does this as well, as does pretty much everybody who cares
about fast scaling.

That is, the two closest source lines are scaled horizontally, the
result is cached, and then the intermediate destination lines can be
generated by just interpolating vertically. This cuts down the amount of
arithmetic required quite substantially, especially for scale factors
larger than 2.

Here is an example.

  http://cgit.freedesktop.org/~sandmann/pixman/log/?h=separable-bilinear

Performance results for fishtank with SSE2 and MMX disabled:

  Before:
  [  0]    image         firefox-fishtank  603.804  603.804   0.00%

  After:
  [  0]    image         firefox-fishtank  535.321  535.321   0.00%

And this is with fishtank, which is exclusively downscalings so each
line is reused at most once.

Søren