[Pixman] [PATCH 0/2] 7-bit bilinear interpolation precision

Thu Jul 5 16:28:14 PDT 2012

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

>> In the case of 2-pass scaling, typically all the pixels from the
>> source image are participating in calculations (unless M >= 2 * N,
>> which is 2x+ downscaling). So the number of horizontal interpolation
>> operations is M * N (each source scanline gets horizontally scaled
>> to size N and cached). The number of vertical interpolation
>> operations is N * N (we are doing it for every destination pixel
>> on the second pass). So the total number of interpolation operations
>> is "M * N + N * N" for 2-pass bilinear scaling.
>> 
>> Now we only need to compare "N * N + N * N" with "M * N + N * N".
>
> In fact the comparison turns into "3 * N * N" vs. "(M + N) * N"
> operations, which is clearly more favourable for 2-pass processing
> in plain C for a wide range of scaling factors.

There may be some confusion here regarding "1-pass" and "2-pass". The
algorithm I'm advocating makes only one pass, but keeps a cache of two
horizontally expanded scanlines. For big downscalings, this does at most
2 * N horizontal interpolations, since in the worst case, processing one
destination scanline will require expanding two source scanlines. So the
real comparison is something like

     3 * N * N    vs.   MIN (2 * N, M) * N + N * N

which is never worse than the 1-pass algorithm in terms of arithmetic,
though I agree it has more overhead in other places.

A true 2-pass algorithm that first expands _all_ scanlines horizontally
would indeed always generate M * N horizontal interpolations, but I
don't think such an approach is a good idea (at least not for a simple
bilinear filter).

>> > Single-pass bilinear is *not* a big loss for upscaling, because as
>> > you noticed there is overhead of the intermediate values. It is
>> > pretty much equivalent to stretching the image in one direction, and
>> > then in the other. You need to store this intermediate stretched
>> > image, which is bigger than the original.
>> 
>> Increasing the scale factor for upscaling is going to eventually let
>> the 2-pass algorithm outperform its single pass counterpart. There
>> must be a crossover point somewhere, and it can be used to make a
>> decision which algorithm to select for better performance in each
>> particular case.
>
> Still the underlying instruction set does matter a lot. We can't simply
> ignore the existence of wide SIMD instruction sets.

There is no reason the horizontal interpolations couldn't be done with
wide SIMD as well. You would keep two fixed point coordinates around in
a register and then compute the left/right weights based on them in a
format suitable for pmaddubsw.

(The issue with 128/0 weights is actually not as bad as I thought. When
the weights are 128 and 0, pmaddubsw will treat 128 as -128 and compute

    -128 * l + 0 * r

which produces a negative number. Since this is the *only* negative
number that can be produced, and what we really want is a multiplication
with 128, a simple pabsw (also available in SSSE3) will fix up the
result).

Søren