[Pixman] [PATCH 3/4] sse2: affine bilinear fetcher

Tue Jan 29 14:05:00 PST 2013

Siarhei Siamashka wrote:

> Going forward, we need to also add support for separable bilinear
> scaling (first horizontal interpolation for single scanlines to
> temporary buffers in L1 cache, then vertical interpolation of these
> buffers to get the final result). Unless I misunderstood something,
> Soeren thinks that it's going to be universally better. I think that
> both direct and separable scaling methods are going to be useful for
> the platforms with wide SIMD. Working with two source scanlines and
> providing results directly is good for extreme downscaling. Separable
> processing is good for extreme upscaling. There must be a backend
> dependent crossover point at a certain scaling factor.

If by "downscaling" you mean making the picture smaller, this is the 
harder one, and the one that requires more than two source scanlines. 
This should be apparent if you imagine a downscale smaller than 1/2, 
since the resulting number of scan lines is less than 1/2 the original, 
if each of them only depends on 2 then there are some scanlines of the 
original that did not contribute to the resulting image.

Attempting to do this is why current cairo downscaling produces very 
noisy images.

Also both upscaling and downscaling can be sped up by using a 2-pass 
method. It is far more important for downscaling but helps both. A 
monkey wrench in this however is that hardware does support 4-input 
bilinear interpolation and so you often get the fastest results by using 
this for upscaling even though it is doing some redundant work. That is 
no help for downscaling however unless you use mipmaps.

I don't think rectangle sources help affine transforms if you plan to do 
2-pass. An affine transform can be split into 3 parts, this can be 
figured out so the resulting matricies multiply back to the original):

1. Either the identity or a swap of x and y axis, chosen to make the 
determinant of the matrix in step 2 as large as possible

2. A transform that only moves pixels vertically (a is 1 and c is 0)

3. A transform that only moves pixels horizontally (b is 0 and d is 1)

By using step 1 to decide between two versions of step 2 (one which 
samples vertically from the source rather than horizontally) then you 
have a two-pass algorithm. But each of them only needs a 1xn or nx1 
sample of input pixels to produce a 1xn or nx1 output section.

There is also a three-pass version (often called Catmull-Rom) that 
produces less blurring for a 45 degree rotation because the intermediate 
images are larger. This is done by a horizontal, vertical, and then 
another horizontal pass. However I have found the 2-pass version works 
fine and it is what is used by Nuke and nobody has complained.

Note that horizontal/vertical can be swapped in all this discussion, 
which is where knowledge of cache lines/etc is going to be more important.