[cairo] [Pixman] Better quality downsampling in cairo/pixman

Thu Jul 15 14:53:32 PDT 2010

Bill Spitzak <spitzak at gmail.com> writes:

[replying out of order]

> I still feel the best approach for Cairo is for the source
> images to keep track of a single "scaled" version. This is an
> integer down-rez of the original image, the scale selected so
> that it is the next larger 1/integer over the actual
> transform. This image is then linearly interpolated by an
> unchanged Pixman to produce the final image. The image is only
> recalculated if the integer scale changes. This will allow
> drawing the same image repeatedly and with changes such as
> rotation with the same speed as it has now. Note that this is
> similar to mip-mapping but produces better results and may be
> much more appropriate for the use of Cairo, especially if 3D
> transforms are not being supported anyway.

I agree that the 1/integer approach is probably a good one for
cairo. Note that the supersampling algorithm for pixman will allow
this, and allow it to happen at full speed.

For example, if the closest 1/integer factor is 5x3, then cairo would
set a supersampling grid of 5x3 with a NEAREST interpolation and the
result from pixman would be precisely the obvious box averaging. It
would run at a reasonable speed too, even with a quite naive
implementation. The total per-source-pixel cost would be a couple of
bit shifts and a couple of additions.

With that in place, it would then make a lot of sense to look into
adding Jeff's code as fast paths to make integral downscaling really
fast.

At the same time, the super sampling algorithm would not break down
horribly in non-integer cases such as rotations, and it has the
advantage that it has well-defined results for gradients and a
hypothetical polygon image. See below for more justification for the
supersampling algorithm.

> There also needs to be a fix for how Cairo does scale-up. It needs
> to
> do non-fuzzy edges when the source is black-padded, 

This has been discussed a couple of times already. I think the outcome
each time has been that (a) there is no clear way to precisely define
what it actually means as a filter, and (b) you can already get the
desired result by setting the repeat mode to PAD and clipping to a
suitably scaled rectangle. If you render that to a group, then you
have a black padded, scaled image with sharp borders.

> and I now think it should also render the pixels as rectangles. This
> will require alterations to how Pixman does it's linear
> interpolation.

What does this mean? NEAREST upscaling will certainly give you
rectangles, but presumably you mean something else.

> > We need to modify this algorithm to work like this:
> > For each destination pixel, several transformed source/mask
> > locations are computed corresponding to a subpixel grid in the
> > destination pixel. The interpolated values for these locations are
> > then averaged together before being composited.
> 
> I think this is a poor explanation. The source pixels are not
> completely random and approaching it this way will produce a very
> slow
> algorithm.

It was not intended as a general explanation of rescaling
algorithms. It was intended to explain the super sampling approach I'm
advocating. I don't think it will result in a very slow algorithm,
particularly not for integer downscaling.

> A much better explanation is that for each destination pixel a
> single
> source *AREA* is defined. For the 6-element matrix being used by
> Cairo, this source area has 6 degrees of freedom, and can be defined
> as a parallelogram mapped to somewhere in the source image (for an
> arbitrary 3D transform, this source area has 8 degrees of freedom
> and
> is an arbitrary convex quadralateral).
> 
> This source area is used to calculate the weighing for all the
> pixels
> from the source image, these weighted values are added to get the
> destination pixel. "Filtering" is the algorithm by which the weights
> are calculated, a possible one is that the weight is the percentage
> of
> the area that the source pixel intersects, but there are both much
> better ones and much faster ones. In particular the weights may be
> non-zero for pixels outside the area.

A full understanding of image transformations I think requires a
signal processing approach. In particular, when you say that the
destination is mapped to a parallelogram in the source image and you
talk about the source pixels that it intersects, you are implicitly
assuming that pixels are rectangular areas. That's a useful model in
many cases, but for image manipulation it's usually better to consider
them point samples. In that model, a destination pixel is mapped to a
*point* in the source image.

It then becomes clear that we need *two* filters: one for
interpolation and one for resampling. The interpolation filter
reconstructs points in between pixels and the resampling one removes
high frequencies introduced by the transformation.

One interpolation filter is NEAREST which is effectively treating the
source pixels as little squares. Currently we don't have any
resampling filter (or I suppose our resampling filter is a Dirac
delta) which means we get terrible aliasing from the high frequencies
that we fail to remove.

The resampling filter is an integral computed over the reconstructed
image (which is defined on all of the real plane). This is difficult
to do in a computer, so we have to do some sort of approximation. The
approximation I'm suggesting is to replace it with a sum.

See also these notes that Owen wrote a long time ago:

    http://www.daimi.au.dk/~sandmann/pixbuf-transform-math.pdf

> It is very common that the source area is reduced to a simpler
> object
> by throwing away some of the degrees of freedom, before figuring out
> the weights. For instance the current implementation is equivalent
> to
> throwing away all the information except the xy center of the shape,
> and then calculating the weight as the ratios of the Manhattan
> distances to the nearest pixels. This is obviously not good enough.
> 
> I think acceptable results are achieved by reducing the shape to the
> closest axis-aligned rectangle or ellipse (thus 4 degrees of
> freedom)
> before using it. Some algorithims go further and reduce it to a
> circle
> (3 degrees of freedom), some keep the angle of the ellipse (5
> degrees
> of freedom). A more restricted shape makes the filtering algorithm
> much simpler and thus faster and very often worth it.

There are tons and tons of resampling algorithms. Doing it with
supersampling is perhaps a bit unconventional, but there are some
advantages to it:

- It works with arbitrary transformations

- It doesn't have pathological behavior in corner cases. 

- It can be applied to gradients and polygons too

- It is conceptually straight forward (it approximates the integral in
  the ideal resampling with a sum).

- With a high sampling rate and a good interpolation filter, it will
  produce high quality output.

- It is simple to implement in a pixel shader without pre-computing
  lookup tables.

- It has explicit tradeoffs between quality and performance. (The
  sample rate and the resampling filter).

- You can predict its performance just from the parameters: it will do
  this many source pixel lookups per destination pixel; it will do
  this much arithmetic per destination pixel.

I think other algorithms generally have shortcomings in one or more of
these areas.

Soren