[Pixman] Image scaling with bilinear interpolation performance

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Feb 17 09:28:03 PST 2011

On Thursday 17 February 2011 07:20:37 김태균 wrote:
> Hi everyone,
> I'm using cairo and pixman to composite images.
> I tested image scaling performance with bilinear filtering.
> But it's a little bit slower than I expected.

Well, you are clearly sugarcoating here, it's not just a little bit :)

Don't know if you have seen it already, but bilinear scaling got
a bit of attention recently with the thread in the cairo mailing list:

Thanks for joining discussion and proposing further improvements.

> What I understand about image scaling procedure is as follows
> 1. Allocate(or acquire statically allocated buffer) horizontal source
> scanline buffer to store interpolated pixels.
> 2. Fetch a scanline from source image to source scanline buffer with
> bilinear interpolation.
> 3. Composite source scanline and destination scanline with operator SRC.
> 4. Go to next scanline.
> 5. Free(or release) source scanline buffer.
> My optimization point is
> 1. It seems possible to avoid fetching by directly combining bilinear
> interpolated result with destination.
> 2. Bilinear interpolation can be optimized using double-blend trick with a
> little bit of precision loss.
> Optimization 1 can reduce one memory write and one memory read operation.
> I think it will not affect the performance significantly.
> However, it can be helpful on some machines with limited cache.

Yes, this single pass processing optimization was tried some time ago and
provided some measurable performance improvement:

Unfortunately it did not go anywhere. Surely, in order to make a contribution
of a single pass processing a lot more visible and desirable to have, bilinear
interpolation part also needs to be improved. Which we all are apparently
trying to do now :)

> Optimization 2 is more critical for image scaling with bilinear filtering.
> I wrote some codes for bilinear_interpolation at pixman-bits-image.c
> // Optimization 2
> static force_inline uint32_t bilinear_interpolation (uint32_t tl, uint32_t
> tr, uint32_t bl, uint32_t br, int distx, int disty)
> {
>     int distxy, distxiy, distixy, distixiy;
>     uint32_t rb, ga;
>     distxy = distx * disty;
>     distxiy = (disty << 8) - distxy;
>     distixy = (distx << 8) - distxy;
>     distixiy = 256*256 - (disty << 8) - (distx << 8) + distxy;

Here we have that (distxy + distxiy + distixy + distixiy) == 256*256, and it is 

>     distxy = distxy >> 8;
>     distxiy = distxiy >> 8;
>     distixy = distixy >> 8;
>     distixiy = distixiy >> 8;

Wouldn't it be preferred to have (distxy + distxiy + distixy + distixiy) == 256
here? My guess is that it may be not always the case based on looking at your
code. Which will be a violation of rule "5. Resampling a solid color should
give a solid color" from http://www.virtualdub.org/blog/pivot/entry.php?id=86

>     /* Red and Blue */
>     rb = (0x00FF00FF & tl)*distixiy + (0x00FF00FF & tr)*distixy +
> (0x00FF00FF & bl)*distxiy + (0x00FF00FF & br)*distxy;
>     rb = (rb >> 8) & 0x00FF00FF;
>     /* Green and Alpha */
>     ga = (0x00FF00FF & (tl >> 8))*distixiy + (0x00FF00FF & (tr >>
> 8))*distixy + (0x00FF00FF & (bl >> 8))*distxiy + (0x00FF00FF & (br >>
> 8))*distxy; ga = ga & 0xFF00FF00;
>     return rb | ga;
> }
> Optimization 2 increased the performance by more than twice.

There are two things to consider: performance and accuracy.

Regarding performance, improving it twice is still a little bit too slow on the 
hardware which has SIMD. On x86, support for SSE2 is pretty much common, so it
is quite natural to use it if it proves to be beneficial. But for the low end
embedded machines with primitive processors without SIMD it may be indeed very
good to have any kind of performance improvements.

Now regarding accuracy. I have added some comments above regarding the
potential solid color issue, but this should be relatively easy to address. I'm
also a bit worried about one more thing (in the original pixman code too, but
let's cover this too while we are discussing accuracy in general). Wouldn't it
be a good idea to do shift with rounding for the final value instead of
dropping the fractional part? And the 'distx'/'disty' variables are also
obtained by right shifting 'ux' by 8 and dropping fractional part, maybe
rounding would be more appropriate. Not doing rounding might cause slight image
drift to the left (and top) on repeated rescaling, and also slight reduction of
average brightness.

> But it does not always produce the same result as the original code, which
> is also an approximation of bilinear interpolation.
> So it can cause pixel correctness issues on some test cases.
> I've done some numerical analysis and it appears that at most you would
> have a difference of 1 for each RGBA value as the original code.
> Let me know if you need for information / explanation.

My guess is that reducing accuracy for getting substantial performance wins
would be quite acceptable and desirable. Especially taking into account that
bilinear scaling is not considered to be providing sufficient quality anyway
(at least for downscaling). This has been also discussed multiple times, for
example here: http://comments.gmane.org/gmane.comp.lib.cairo/19902

So favoring performance over quality for the bilinear scaling may be perfectly

I have only one concern about testing. Supposedly when we get both C and SSE2 
implementations, it would be much easier for testing if they produce identical
results. Otherwise tests need to be improved to somehow be able to take slight
differences into account.

Best regards,
Siarhei Siamashka

More information about the Pixman mailing list