[Pixman] Image scaling with bilinear interpolation performance
siarhei.siamashka at gmail.com
Wed Mar 2 01:51:30 PST 2011
On Mon, Feb 28, 2011 at 11:22 AM, Taekyun Kim <podain77 at gmail.com> wrote:
> I'm sorry about that I did not give you enough information on my test.
> I also tuned some codes of the function
> I made some fast path for the case where mask == NULL and ux_bottom ==
> (But still using temporary source fetch buffer)
> Anyway, majority of our target machines support NEON instructions.
> Thus, your recent NEON patches are exactly what we are looking for.
> I really appreciate what you've done for this :-)
Well, in order to stay focused on practical improvements, we need to
know if somebody is really interested and can invest some time in
improving bilinear scaling performance for the target machines which
don't support NEON instructions. The current bilinear interpolation
method seems to be perfectly fine for the use with NEON and no changes
to it are really necessary.
If better performance is also wanted on armv5te, armv6, mmx, sse2 or
ssse3 capable machines, then reducing 'disty' precision from 8-bit to
7-bit looks really promising to me.
If the target machines also include armv4, mips32r2 or similar
architectures without the support of fast 16-bit signed multiplication
instructions, then something similar to your changes or the variant of
code from Soeren might be also interesting to investigate.
Going away from doing bilinear interpolation exactly the same way on
all targets is a bit more complex, because we have to take that into
account in the automated pixman tests, and somebody (most likely you)
has to provide this update for the test suite. Alternatively, I think
it might be possible to just introduce something like a new
'--enable-unsupported-optimizations-and-hacks' configure option, but
ignore all the bugreports resulting from its use.
> At this point, my concern is about which cases have been (or will be)
> optimized using NEON and which are not.
> There can be so many fast paths for combination of mask, rotation(including
> special angle), scale factors, pixel formats, operators and so on.
> As you mentioned it is boring job to make fast paths for all these cases.
> So we should focus at the cases most commonly used.
> I think most of the common operations are already optimized for both NEON
> and SSE2.
> What do you think is the remaining parts?
Fortunately the number of needed fast paths seems to be reasonably
finite and most of these can be optimized:
On the other hand, we might not see the complete picture, because
people are generally trying different methods to achieve something and
try to avoid whatever works slow:
So the current plan is to provide more bilinear fast paths for the
operations which are frequently used. And also add SIMD optimized
iterators-based bilinear fetchers to handle the rest of the cases
(with the intermediate temporary buffer, but still faster than what we
have now). If you want to also participate in this activity and
contribute some optimizations which are important for you, I would
suggest to set up a public git repository with your changes. There are
some free git hosting options available, for example gitorious and
github. But at the very least, providing patches which apply cleanly
to the current pixman sources from git is going to make it much easier
for the other people to review and test your code.
More information about the Pixman