[Pixman] [PATCH 0/7] Faster pipelined ARM NEON bilinear scalers: 'src_8888_8888' and 'src_8888_0565'
siarhei.siamashka at gmail.com
Fri Apr 29 06:20:25 PDT 2011
On Sun, Apr 10, 2011 at 11:30 PM, Soeren Sandmann <sandmann at cs.au.dk> wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> One more possible optimization is to reduce interpolation precision
>> from 8 bit to 4 bit as suggested earlier by Taekyun Kim:
>> This can make final shifting and color components packing faster,
>> saving 1 instruction per pixel. Considering that the bilinear code
>> currently needs ~10 cycles per pixel, this can provide ~10% speedup.
>> That's the last desperate measure, but given that scr_8888_8888
>> bilinear scaling utilizes ~58% of memory bandwidth (~93 MPix/s vs.
>> ~160 MPix/s for nearest scaling and vs. ~650 MB/s for memcpy) on my
>> ARM device, anything might be useful. I just wonder about the current
>> filter types used in pixman. They are:
>> I wonder what would happen if we introduce some 'low precision'
>> bilinear filter and map it to PIXMAN_FILTER_GOOD? PIXMAN_FILTER_BEST
>> and PIXMAN_FILTER_BILINEAR would still remain the same as the existing
>> bilinear. PIXMAN_FILTER_FAST and PIXMAN_FILTER_NEAREST would still
>> remain as the existing nearest. Of course this may be only useful if
>> such lower precision bilinear filter proves to provide much better
>> image quality than nearest filter and noticeably better performance
>> than real bilinear filter.
> As far as I know, GdkPixbuf uses the equivalent of a bilinear filter
> with four bits of precision. Once you scale more than 16x, banding
> artefacts are clearly visible, but people don't normally complain about
> its quality, and 16x scaling with bilinear interpolation won't look
> great in any case, so if dropping precision to four bits really makes a
> big performance difference, maybe we should just change
> PIXMAN_FILTER_BILINEAR to do that.
Reducing bilinear interpolation precision provides a lot more
performance benefits for x86 (at least 7 bits would be better) and 4
bits should be really the best. Anyway, who is going to decide whether
the reduced precision is still ok?
One can see a heavy use of bilinear scaling (over_8888_8_8888 fast
path, also with optional horizontal flipping) on the web pages like
this, especially if configured to animate 1000 fishes:
I wonder how well pixman can actually perform there if it gets some
good SSSE3 bilinear optimizations and starts utilizing all CPU cores
(maybe trying OpenMP should be easy)?
More information about the Pixman