[Pixman] [PATCH] NEON optimizations for bilinear scaled scanline functions with A8 mask for operator src, over, add

Tue Apr 12 02:24:20 PDT 2011

On Tue, Apr 12, 2011 at 5:54 AM, Taekyun Kim <podain77 at gmail.com> wrote:
> Hi,
> I send pixman patch for NEON optimizations for several bilinear scaled
> scanline functions.

Thanks.

> Following functions are optimized.
> over_8888_n_8888
> add_8888_n_8888
> src_8888_8_8888
> over_8888_8_8888
> add_8888_8_8888
> This patch is based on pixman master branch with latest commit id =
> a2153222677327be43251012f462d19a7e98ce14. (Soeren's commit on April 3)
> Because there can be some conflicts with latest commit of siarhei's bilinear
> optimizations.

I see that this patch was a bit rushed and has some problems (even if
applied to a2153222677327be43251012f462d19a7e98ce14). The most
apparent ones are:

1. the reversal of the changes from commit
a2153222677327be43251012f462d19a7e98ce14:

-    vmvn.8      q12, q12
-    vmvn.8      d26, d26
+    vmvn.8      d24, d24
+    vmvn.8      d25, d25
     vmull.u8    q8,  d24, d4
     vmull.u8    q9,  d25, d5
+    vmvn.8      d26, d26
     vmvn.8      d27, d3

2. the removal of over_8888_8888 fast path:

-    SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, a8r8g8b8, neon_8888_8888),
-    SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8, x8r8g8b8, neon_8888_8888),
-    SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, x8r8g8b8, neon_8888_8888),

3. seems like this is wrong (over_8888_n_8888 fast path has boolean
flags 'have_mask' and 'mask_is_solid' both set to FALSE):

+PIXMAN_ARM_BIND_SCALED_BILINEAR (SKIP_ZERO_SRC, neon, 8888_n_8888, OVER,
+                                         uint32_t, uint32_t,
uint32_t, FALSE, FALSE)

Also the pixman tests fail, which is not very surprising, given the
problem with over_8888_n_8888

To move forward, first I would suggest to split bilinear NEON
optimizations to a separate assembly file, starting with your code (my
current bilinear code can stay in the old file, thus helping to avoid
conflicts). This way an old assembly file will handle unscaled and
nearest scaled fast paths. And the new one will handle bilinear
scaling. This makes sense because in the former case, the code is
built around the compositing operation which is the performance
critical part with unscaled or nearest fetching being less important.
For bilinear fast paths, the bilinear filter itself is the most
performance critical part, so it becomes the core of the code with the
rest of the stuff built around it.

Your patch can be also split into parts like adding the changes to
'pixman-arm-common.h' first, and then using them in the follow up
patches. And enabling different fast path functions is better to be
done in separate patches for bisecting purposes.

Providing patches which apply cleanly to the current pixman git master
is also needed. I landed my patches there in order to have less stuff
in the air. So that we can have all the merge conflicts resolved
sooner.

> It sill have lots of places to optimize, for example, preloading mask and
> destination pixels, better handling of two, one pixel case.
> However it gives us reasonable performance than before.

Yes, that's the point. It would be nice to get at least some speedup
right now, and then keep improving performance.

-- 
Best regards,
Siarhei Siamashka