[Pixman] [PATCH] ARM: NEON: optimization for bilinear scaled 'over 8888 8888'

Mon Apr 11 01:03:09 PDT 2011

On Mon, Apr 11, 2011 at 6:51 AM, Taekyun Kim <podain77 at gmail.com> wrote:
> 2011/4/11 Siarhei Siamashka <siarhei.siamashka at gmail.com>
>>
>> On Mon, Apr 4, 2011 at 7:12 PM, Taekyun Kim <podain77 at gmail.com> wrote:
>> > I've done some additional work on overlapped blit functions and bilinear
>> > filter with A8 mask for operator OVER and ADD.
>> > (tight scheduled work here...)
>>
>> About these things. I would really appreciate if you could send the
>> latest variants of your patches if you don't mind to contribute them
>> to pixman,
>>
>> I understand that you need this functionality right now and any
>> improvement over the slow general C code is a great help for many
>> pixman users. I'm just approaching the problem in a bit different way
>> - first try to tweak the code to make it as fast as possible, and only
>> then extend and generalize it to support more operations. For example,
>> it would really suck to implement a few dozens of optimized NEON fast
>> path functions for various operations, and only then realize that
>> doing it in a completely different way would have provided maybe
>> something like 10% more performance. Anyway, here we have some
>> conflict of interests between us, which would be really nice to
>> resolve somehow.
>>
>> I wonder if the best solution for everyone would be to just add the
>> first generation of NEON fast paths for these OVER and ADD operations
>> developed by you if they don't cause any regressions. Maybe in such a
>> way that they use their own set of helper macros. And at the same time
>> continue tweaking one more experimental implementation, trying to get
>> a 'perfect' bilinear scaling code (with another set of helper macros
>> in order to avoid any clash). That would help us to get a reasonable
>> performance for many scaled compositing operations right now, just in
>> time for pixman 0.22.0 release. A drawback is that there will be some
>> extra code duplication, but eventually we would be able clean it up
>> and ensure that only the fastest code remains in the future.
>>
>> What do you think? And surely also the opinion of Soeren is important
>> here.
>
> I like this approach.
> Users can benefit from reasonably optimized functions for various operators,
> while we are struggling with extreme optimization work.
> But currently there are some minor bugs with my patches.
> I can send the patches with these bugs fixed.

There is not much time left, if any. I guess the last patches may be
submitted only in just one or two days from now in order to still have
time to be reviewed for 0.21.8, according to the recent release
schedule announcement.

I had a hope that at least some part of your code is ready for
production use. After all, your bilinear scaled over_8888_8888 patch
seemed to work even though there was some room for performance
improvement.

-- 
Best regards,
Siarhei Siamashka