[Pixman] [PATCH 0/2] 7-bit bilinear interpolation precision
siarhei.siamashka at gmail.com
Thu Jul 5 00:22:35 PDT 2012
On Sun, Jul 1, 2012 at 12:25 AM, Søren Sandmann <sandmann at cs.au.dk>
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> On Tue, Jun 26, 2012 at 6:13 AM, Jeff Muizelaar
>> <jmuizelaar at mozilla.com> wrote:
>>> We recently switched to using the same precision as Skia on
>>> Android. (4 bits for RGBA32 and 2 bits for RGB565)
>> This is a good point. If using just 4 or even 2 bits of interpolation
>> precision is acceptable for Skia, then maybe the current bilinear
>> interpolation precision is really excessive in pixman. It would be
>> too generous to give a handicap to Skia :)
>> So should we also try 4-bit interpolation? Compared to 7-bit or 8-bit
>> interpolation, it has an advantage that all the calculations need
>> only 16-bit unsigned variables. This is faster for at least C, x86
>> SSE2 and ARM NEON code.
> Four bits is also the precision that GdkPixbf has used for many years
> with no complaints from users that I'm aware of. If you zoom more than
> 16x with GdkPixbuf, banding artefacts definitely start showing up, but
> zooming that far is unlikely to look good with bilinear filtering
> anyway. So dropping to four bits doesn't sound too bad to me.
OK, we'll see how much performance can be gained by going to lower
precision. There is only way to find out: implement different variants
and benchmark them against each other.
> That said, I'm a little concerned that nobody is trying separable
Well, you are trying. So this does not qualify as nobody ;)
> instead of worrying about these microoptimizations.
It's not instead, but in addition to. All these optimizations are
independent from each other and can be used together with some
Separable scaling is good idea, but it is not a silver bullet.
Downscaling is still a valid use case, and separable scaling would
provide no reduction for the number of arithmetic operations for it.
Also x86 SSSE3 and ARM NEON add some extra challenges:
* Using 8-bit multiplications for horizontal interpolation is difficult
as the weight factors need to be updated for each pixel. Single pass
scaling can easily use 8-bit multiplications for vertical interpolation
as the weight factors are pre-calculated before entering loop.
* Separable scaling needs extra load/store instructions to save
temporary data between passes
* When we are approaching the memory speed barrier, the separation of
operations into passes may result in uneven usage of memory subsystem.
Still, for example on ARMv6 without real SIMD, it seems to be difficult
to implement both vertical and bilinear horizontal interpolation in one
pass. There are just not enough general purpose registers to
sufficiently unroll the loop to get rid of pipeline stalls and do this
without spilling temporary data to memory or reloading constants. So
the support for separable scaling is a very welcome feature for
There is no need to put all eggs into one basket. Having multiple
scaling methods available should not be a problem, each tuned for
different use cases and different target architectures.
> As far as I know Skia does this as well, as does pretty much everybody
> who cares about fast scaling.
> That is, the two closest source lines are scaled horizontally, the
> result is cached, and then the intermediate destination lines can be
> generated by just interpolating vertically. This cuts down the amount
> of arithmetic required quite substantially, especially for scale
> factors larger than 2.
> Here is an example.
> Performance results for fishtank with SSE2 and MMX disabled:
> [ 0] image firefox-fishtank 603.804 603.804 0.00%
> [ 0] image firefox-fishtank 535.321 535.321 0.00%
> And this is with fishtank, which is exclusively downscalings so each
> line is reused at most once.
Looks good to me.
More information about the Pixman