[Pixman] [PATCH 0/7] Faster pipelined ARM NEON bilinear scalers: 'src_8888_8888' and 'src_8888_0565'

Fri Apr 29 06:20:25 PDT 2011

On Sun, Apr 10, 2011 at 11:30 PM, Soeren Sandmann <sandmann at cs.au.dk> wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>
>> One more possible optimization is to reduce interpolation precision
>> from 8 bit to 4 bit as suggested earlier by Taekyun Kim:
>> http://lists.freedesktop.org/archives/pixman/2011-February/001044.html
>> This can make final shifting and color components packing faster,
>> saving 1 instruction per pixel. Considering that the bilinear code
>> currently needs ~10 cycles per pixel, this can provide ~10% speedup.
>> That's the last desperate measure, but given that scr_8888_8888
>> bilinear scaling utilizes ~58% of memory bandwidth (~93 MPix/s vs.
>> ~160 MPix/s for nearest scaling and vs. ~650 MB/s for memcpy) on my
>> ARM device, anything might be useful. I just wonder about the current
>> filter types used in pixman. They are:
>>     PIXMAN_FILTER_FAST,
>>     PIXMAN_FILTER_GOOD,
>>     PIXMAN_FILTER_BEST,
>>     PIXMAN_FILTER_NEAREST,
>>     PIXMAN_FILTER_BILINEAR
>> I wonder what would happen if we introduce some 'low precision'
>> bilinear filter and map it to PIXMAN_FILTER_GOOD? PIXMAN_FILTER_BEST
>> and PIXMAN_FILTER_BILINEAR would still remain the same as the existing
>> bilinear. PIXMAN_FILTER_FAST and PIXMAN_FILTER_NEAREST would still
>> remain as the existing nearest. Of course this may be only useful if
>> such lower precision bilinear filter proves to provide much better
>> image quality than nearest filter and noticeably better performance
>> than real bilinear filter.
>
> As far as I know, GdkPixbuf uses the equivalent of a bilinear filter
> with four bits of precision. Once you scale more than 16x, banding
> artefacts are clearly visible, but people don't normally complain about
> its quality, and 16x scaling with bilinear interpolation won't look
> great in any case, so if dropping precision to four bits really makes a
> big performance difference, maybe we should just change
> PIXMAN_FILTER_BILINEAR to do that.

Reducing bilinear interpolation precision provides a lot more
performance benefits for x86 (at least 7 bits would be better) and 4
bits should be really the best. Anyway, who is going to decide whether
the reduced precision is still ok?

One can see a heavy use of bilinear scaling (over_8888_8_8888 fast
path, also with optional horizontal flipping) on the web pages like
this, especially if configured to animate 1000 fishes:
    http://ie.microsoft.com/testdrive/performance/fishietank/
I wonder how well pixman can actually perform there if it gets some
good SSSE3 bilinear optimizations and starts utilizing all CPU cores
(maybe trying OpenMP should be easy)?

-- 
Best regards,
Siarhei Siamashka