[Pixman] [PATCH 0/2] 7-bit bilinear interpolation precision

Thu Jul 5 15:44:30 PDT 2012

On Fri, 6 Jul 2012 00:08:46 +0300
Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:

> On Thu, 05 Jul 2012 12:25:55 -0700
> Bill Spitzak <spitzak at gmail.com> wrote:
> 
> > On 07/05/2012 12:22 AM, Siarhei Siamashka wrote:
> > 
> > > Separable scaling is good idea, but it is not a silver bullet.
> > > Downscaling is still a valid use case, and separable scaling would
> > > provide no reduction for the number of arithmetic operations for
> > > it.
> > 
> > This is not true. In fact 2-pass scaling is a much *bigger* win for 
> > downscaling if you use filters of size greater than 1. And filters 
> > larger than 1 are absolutely required to fix the artifacts that are
> > in current downscaling.
> 
> Looks like you got it backwards. Let's do some calculations. For the
> sake of simplicity, let's assume that MxM image is scaled to NxN size.
> 
> In the case of single pass bilinear scaling, we fetch 2x2 pixel block
> from the source image for each destination pixel (total N * N times)
> and perform 1 vertical interpolation followed by 1 horizontal
> interpolation for it. This is total "N * N + N * N" interpolation
> operations (let's ignore the fact that horizontal interpolation is a
> little bit more expensive).

Actually I stand corrected. If we are fetching 2x2 pixel block, then
we need to interpolate 2 pairs of pixels first in one direction.
And then do one more interpolation in another direction. So it is
"N * N + N * N + N * N" interpolation operations in plain C.
My mistake was getting too much reliant on SIMD. And at least
for 128-bit wide SIMD, it does not matter whether we are
interpolating one pair of pixels or two pairs at once.

> In the case of 2-pass scaling, typically all the pixels from the
> source image are participating in calculations (unless M >= 2 * N,
> which is 2x+ downscaling). So the number of horizontal interpolation
> operations is M * N (each source scanline gets horizontally scaled
> to size N and cached). The number of vertical interpolation
> operations is N * N (we are doing it for every destination pixel
> on the second pass). So the total number of interpolation operations
> is "M * N + N * N" for 2-pass bilinear scaling.
> 
> Now we only need to compare "N * N + N * N" with "M * N + N * N".

In fact the comparison turns into "3 * N * N" vs. "(M + N) * N"
operations, which is clearly more favourable for 2-pass processing
in plain C for a wide range of scaling factors.

> In the case of downscaling (N < M), 2-pass scaling needs more arithmetic
> operations. Now let's remember that horizontal interpolation is more
> expensive (horizontal weights need to be recalculated every time) and
> storing/reloading data between passes adds more overhead, which makes
> the outcome even worse for 2-pass scaling.
> 
> > Single-pass bilinear is *not* a big loss for upscaling, because as
> > you noticed there is overhead of the intermediate values. It is
> > pretty much equivalent to stretching the image in one direction, and
> > then in the other. You need to store this intermediate stretched
> > image, which is bigger than the original.
> 
> Increasing the scale factor for upscaling is going to eventually let
> the 2-pass algorithm outperform its single pass counterpart. There
> must be a crossover point somewhere, and it can be used to make a
> decision which algorithm to select for better performance in each
> particular case.

Still the underlying instruction set does matter a lot. We can't simply
ignore the existence of wide SIMD instruction sets.

-- 
Best regards,
Siarhei Siamashka