[Pixman] Image scaling with bilinear interpolation performance

Wed Feb 16 21:20:37 PST 2011

Hi everyone,

I'm using cairo and pixman to composite images.
I tested image scaling performance with bilinear filtering.
But it's a little bit slower than I expected.

Test was performed with cairo 1.8.8 & pixman 0.21.2 on ubuntu 10.04 with
intel quad core CPU @2.66GHz.
I used cairo image surface for both source and destination.

Test codes are something like this...

#define SCALE_X 2.0
#define SCALE_Y 2.0

cairo_scale(cr, SCALE_X, SCALE_Y);
cairo_set_source_surface(cr, srcSurface, 0, 0);
cairo_pattern_set_filter(cairo_get_source(cr), CAIRO_FILTER_BILINEAR);
cairo_set_operator(cr, CAIRO_OPERATOR_SOURCE);

startTime = rdtsc();
cairo_paint(ctx);
endTime = rdtsc();

What I understand about image scaling procedure is as follows

1. Allocate(or acquire statically allocated buffer) horizontal source
scanline buffer to store interpolated pixels.
2. Fetch a scanline from source image to source scanline buffer with
bilinear interpolation.
3. Composite source scanline and destination scanline with operator SRC.
4. Go to next scanline.
5. Free(or release) source scanline buffer.

My optimization point is

1. It seems possible to avoid fetching by directly combining bilinear
interpolated result with destination.
2. Bilinear interpolation can be optimized using double-blend trick with a
little bit of precision loss.

Optimization 1 can reduce one memory write and one memory read operation.
I think it will not affect the performance significantly.
However, it can be helpful on some machines with limited cache.

Optimization 2 is more critical for image scaling with bilinear filtering.
I wrote some codes for bilinear_interpolation at pixman-bits-image.c

// Optimization 2
static force_inline uint32_t bilinear_interpolation (uint32_t tl, uint32_t
tr, uint32_t bl, uint32_t br, int distx, int disty)
{
    int distxy, distxiy, distixy, distixiy;
    uint32_t rb, ga;

    distxy = distx * disty;
    distxiy = (disty << 8) - distxy;
    distixy = (distx << 8) - distxy;
    distixiy = 256*256 - (disty << 8) - (distx << 8) + distxy;

    distxy = distxy >> 8;
    distxiy = distxiy >> 8;
    distixy = distixy >> 8;
    distixiy = distixiy >> 8;

    /* Red and Blue */
    rb = (0x00FF00FF & tl)*distixiy + (0x00FF00FF & tr)*distixy +
(0x00FF00FF & bl)*distxiy + (0x00FF00FF & br)*distxy;
    rb = (rb >> 8) & 0x00FF00FF;

    /* Green and Alpha */
    ga = (0x00FF00FF & (tl >> 8))*distixiy + (0x00FF00FF & (tr >> 8))*distixy +
(0x00FF00FF & (bl >> 8))*distxiy + (0x00FF00FF & (br >> 8))*distxy;
    ga = ga & 0xFF00FF00;

    return rb | ga;
}

Optimization 2 increased the performance by more than twice.
But it does not always produce the same result as the original code, which
is also an approximation of bilinear interpolation.
So it can cause pixel correctness issues on some test cases.
I've done some numerical analysis and it appears that at most you would have
a difference of 1 for each RGBA value as the original code.

Let me know if you need for information / explanation.

It's my great pleasure if this can be useful for your project.
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20110217/eb7ed43c/attachment.htm>