[cairo] Proposal for 2-pass filter in pixman

Thu Aug 21 19:27:42 PDT 2014

I do not know enough about pixman to figure out how to add this yet, but 
this is the idea and math:

Proposed 2-pass perspective-capable transform for pixman:

Overview:

The first pass moves the pixels vertically so they are all in the
correct vertical postion, producing an intermediate image. This image
therefore is the width of the source image and the height of the
destination. For a non-affine transform it will have curved top and
bottom edges.

The second pass then moves those pixels horizontally into their final
positions.

Because pixels are only moved parallel to an axis, each pass uses a
1-D filter. The result of one vertical filter is used by more than one
horizontal filter, resulting in the speed gain.

Some transforms would produce a very tiny intermediate image, for
instance a 90 degree rotation would produce an infinitely thin diagonal
line. This is solved by transposing the image first. This transpose is
handled by the first pass as well.

The actual code only needs to store one row of the intermediate
image. It fills it from pass one, then does pass 2 into the
destination, then repeats for the next row. Since only one row is
needed the intermediate buffer is small and may even be on the stack.
This can also be multithreaded by having each thread do a different row.

Comparisons to some alternatives:

This is much faster than the current version. It may even be faster
for bilinear sampling it only does 3 pixel accesses per output pixel.

There are artifacts. This is because the intermediate image can be
smaller than both the source and destination, thus it must lose
information. The worst case is a 45 degree rotation, where the
intermediate image has an area .717 of the source image.

For scaling down this can be slower than a version that swaps the
filter order and keeps the entire intermediate image. This is
because that uses fewer vertical filters. Most example code of
2-pass filtering works this way. But:

* This is only a linear speedup, and I suspect on modern processers
   all speed advantage is negated by less cache coherence.

* It's slower for scaling up so you have to special case that.

* Requres a big buffer to store intermediate image.

* Much harder to avoid calculating unused parts of the intermediate
   image.

* Transpose must be the same for entire image, making parts of
   perspective transforms very poor quality because the transpose is
   wrong there.

Comparisons to Nuke:

This is similar to the 2-pass transform I developed for Nuke in 1996
which is still used today. Probably 50% of the movie frames you see
today have passed through this transform at least once, so I guess it
is acceptable. However this is a different algorithm in that the
passes are swapped and the intermediate buffer is one row, and I have
done all-new math derivations from scratch here, so there are no IP
issues.

The advantages and disadvantages of Nuke's version are listed above.
You can easily see the artifacts for perspective transforms by doing
an extreme one in Nuke.

Nuke linearly interpolates the filter size across the destination
image. This is mathematically wrong and I compute the correct
derivatives below.

Math:

The transform is based on the back-projection matrix. I am using the
following letters for the entries in the matrix:

     |u*w|   |A C X|   |x|
     |v*w| = |B D Y| * |y|
     | w |   |E F W|   |1|

x,y is a location in the output, typically the center of a pixel. u,v
is a location in the input image. ACXBDYEFW is a 3x3 matrix. The odd
lettering is to put ABCDXY in the same order that the Cairo matrix
uses and to match PostScript/PDF/SVG documentation.

A significant subclass of transforms are affine transforms. For
instance these are the only transforms Cairo produces. In these the
last row of the matrix (E F W) is 0,0,1. This significantly simplifies
the math and I will give the affine reduction of the expressions at
each step.

It is easist to figure out the steps backwards:

SECOND PASS:

The last pass moves from source u to dest x. Thus as you calculate the
pixel at x you must find the source u:

     u = (Ax+Cy+X)/(Ex+Fy+W)
     affine: u = Ax+Cy+X

This is the division of 2 linear equations. For affine it is linear.

Filter width is du/dx:

     du/dx = (A(Ex+Fy+W)-E(Ax+Cy+X))/(Ex+Fy+W)^2
           = (A(Fy+W)-E(Cy+X))/(Ex+Fy+W)^2
     affine: du/dx = A

The numerator is constant for a given y and the denominator is the
square the u denominator. For affine this is a constant over the
entire image.

FIRST PASS:

First pass moves from source v to dest y:

     v = (Bx+Dy+Y)/(Ex+Fy+W)
     affine: v = Bx+Dy+Y

But our horizontal position is u, not x, so we want to figure this out
in terms of u. Inverting the second pass equation gives:

     x = (u(Fy+W)-Cy-X)/(A-Eu)
     affine: x = (u-Cy-X)/A

Substituting this x into the equation for v and with some help from
WolframAlpha I get:

     v = (B(u(Fy+W)-Cy-X)/(A-Eu)+Dy+Y)/(E(u(Fy+W)-Cy-X)/(A-Eu)+Fy+W)
       = (B(u(Fy+W)-Cy-X)+(A-Eu)(Dy+Y))/(A(Fy+W)-E(Cy+X))
     affine: v = B(u-Cy-X)/A+Dy+Y

For a given y this is a linear equation of u.

The filter size is dv/dy, I again resorted to WolframAlpha:

     dv/dy = (A-Eu)(ADW-AFY-BCW+BFX+CEY-DEX)/(E(Cy+X)-A(Fy+W))^2
     affine: dv/dy = (AD-BC)/A

For a given y this is a linear equation of u. That big constant part
is the determinant of the matrix. The affine value is constant for the
entire image, and multiplying the two affine derivatives gives you the
determinant, which makes sense, so I'm pretty certain I got it right.

TRANSPOSE:

A transpose will swap A,C,X and B,D,Y in all the above functions, and
the first pass will retrieve the pixel at (v,u) rather than (u,v).

The larger the second pass derivative du/dx is, the larger the
intermediate image is. Therefore whether the transpose should be done
can be decided by which one has a larger du/dx. Because the
denominator in that expression is squared it is always positive and
drops out of the comparison:

     transpose = abs(B(Fy+W)-E(Dy+Y)) > abs(A(Fy+W)-E(Cy+X))
     affine: transpose = abs(B) > abs(A)

Since this does not depend on x it is the same for an entire row. This
is great as it means you can get the best possible image while still
doing an entire row at once. For affine the answer is the same for the
entire image.

FILTERS:

The absolute value of the deriviative chooses the size of the filter.

When enlarging, if "blurry pixels" are wanted, then values less than
1.0 should be replaced by 1.0. The sizes should then be clamped to a
useful range, 1/128..128 seems pretty good.

Nuke uses a single very large filter and then samples inside it for
smaller filters. This requires the "reconstruction" filter to be
impulse, which limits what filter options work. I think Nuke had to
special-case the box and impulse filters, and it does not support
non-blurry pixels. It also has to calculate the normalization for all
filters of size > 1. For all these reasons I don't feel this is the
best approach.

Instead an array of pre-calculated filters covering the full range of
sizes and fractional offsets should be used. But the sizes and offsets
are not evently spaced, they should be much closer for smaller filter
sizes. I am not sure what the best method for quickly looking up the
filter would be.