[cairo] Concerns about using filters for downscaling

Wed Mar 26 12:45:49 PDT 2014

Søren Sandmann wrote:
> Bill Spitzak <spitzak at gmail.com> writes:
> 
>> I'm also still mystified by the "two filters" approach. I have no idea
>> what you mean by the sampling and reconstruction filter. I have done a
>> lot of image processing, and transformations only use *one* filter,
>> called the "sampling filter". This filter changes depending on the
>> scale and on the fractional portion of the sample point.
> 
> Say that we want to transform a source image with a transformation that
> is a downscale in one dimension and an upscale in another.

Actually this is not any more difficult than two unequal values that are 
both larger or both smaller than one, see below.

Each pixel in an affine transformation actually has a source position 
and two vectors (which I usually call the derivative) describing how the 
source position moves as the output pixel is changed horizontally or 
vertically. It is easier to picture this as describing a rhombus in the 
source space, all the rhombuses are the same shape and tile to cover the 
input image. Note that the filtering can (and should) sample outside 
this rhombus, it is just describing the "size" of the source filter. 
Virtually all transformations convert this rhombus into an axis-aligned 
rectangle, including the pixman one. These rectangles do not tile the 
source image, therefore we have already made an approximation. This 
means that approximations are allowed in describing the algorithm.

(for non-affine transforms the shape is an arbitrary quad, with 8 
degrees of freedom. All systems I have seen abandon this immediately, 
instead using the rhombus described by the derivative at the center of 
the output pixel. This thus adds even more inaccuracy.)

> (a) Get rid of the replicated copies of the baseband. This is
> done by lowpass filtering so that only one copy remains.
> 
> (b) Transform the image. This will distorts the one remaining baseband
> copy so that it becomes bigger in one dimension and smaller in
> another. Parts of it will now stick out outside the baseband.
> 
> (c) Lowpass filter again to get rid of the bits of baseband copy that
> ended up outside the baseband square. Otherwise those bits will cause
> aliasing in the next step.
> 
> (d) Sample the image; this will once again replicate the baseband to
> infinity.

Yes this is correct, but if you assume perfect lowpass filters one of a 
or c is irrelevant. Only the smaller lowpass filter is used since it 
will also cut off all the frequencies that the larger one uses.

A smaller lowpass filter corresponds to a larger sinc filter. The end 
result is that at any scale less than one the lowpass filter in c is the 
only one needed, if you back-map this through the transform this is a 
sinc filter widened by the scale. For scales greater than one the 
lowpass filter in a is the only one needed, this is a sync of width one.

It is true that other filters which are not perfect lowpass filters will 
produce different results if you actually went and implemented this by 
multiplying one width-1 filter by another 1/scale size filter (as it 
appears pixman is doing). However since the purpose is to *simulate* a 
lowpass filter, it works much better to assume your filter is a real 
lowpass filter and thus just use the wider one, ignoring the smaller 
one. This seems to be where the confusion in pixman arises. We are *not* 
trying to simulate 2 imperfect filters, we are instead trying to pick a 
single imperfect filter that is as close as possible to a perfect one!

The end result is that sampling should be done by multiplying a single 
filter by the source image. The width of this filter is max(1,1/scale). 
This works even if the scale is greater than 1 in one direction and less 
in the other, it just means the filter is 1 wide in one direction.

For filters with frequencies <= 1/2 (which is true of all the above 
ones) sampling theory indicates that only one sample per pixel is 
needed, multiplied by the filter at that point. Although the box used by 
bilinear has infinite frequency, it can be represented as a triangle 
with width 2 if it is used this way, and thus integrated into algorithms 
using other filters.

Now I have certainly seen algorithms that *switch* filters at a given 
scale. If the switch is done at scale = 1, then you could say step a is 
one of these filters and step c is a different one. However if you 
assume they are simulating perfect lowpass filters then you just choose 
one of them, you don't multiply. Furthermore I have seen the cutoff at 
points other than 1, one example is the suggestion that bilinear be used 
for scales >= 1/2. It is also common to switch to pixel-aligned box 
filter at very small scales.

> Now, when the transform is affine, all the four steps can be combined
> into one by convolving the two filters, and this is what pixman does.

This can be done for non-affine transforms. All that means is that the 
filter varies depending on which pixel, but the single convolved result 
can be used for each pixel.

In actual implementations where the filter is a weight to multiply each 
pixel by, the filter will "vary" even for affine transforms. It depends 
on the fractional portion of the sample point. The only time the filter 
will not "vary" is if the scale is an integer and the rotation a 
multiple of 90 degrees.