[Pixman] [PATCH] Add support for aarch64 neon optimization

Tue Apr 5 14:12:13 UTC 2016

On Tue, 5 Apr 2016 08:26:38 -0400
"Lennart Sorensen" <lsorense at csclub.uwaterloo.ca> wrote:

> On Tue, Apr 05, 2016 at 08:20:54PM +0900, Mizuki Asakura wrote:
> > > This code is not just there for prefetching. It is an example of
> > > using software pipelining:  
> > 
> > OK. I understand.
> > But the code is very hard to maintain... I've met too many register
> > conflictions.
> > # q2 and d2 were used in a same sequence. It cannot be exist in aarch64-neon.
> > 
> > Anyway, I'll try to remove unnecessary register copies as you've suggested.
> > After that, I'll also tryh to make benchmarks that
> > * advance vs none
> > * L1 / L2 / L3 (Cortex-A53 doesn't have), keep / strm
> > to find the better configuration.
> > 
> > But it is only a result of Cortex-A53 (that you ane me have). Does anyone can
> > test other (expensive :) aarch64 environment ?
> > (Cortex-Axx, Apple Ax, NVidia Denver, etc, etc...)  
> 
> If someone can list what to run for a test I can probably run it on an A57.

Hi Lennart,

This is great, thanks. Could you please clone the following branch?

    https://cgit.freedesktop.org/~siamashka/pixman/log/?h=20160405-separable-neon-bilinear-test

And then try to compile static 32-bit pixman test programs using an
ARM crosscompiler? 

   ./autogen.sh
   ./configure --host=arm-linux-gnueabihf --enable-static-testprogs \
               --disable-libpng --disable-gtk
   make

Then run the "scaling-bench" program from the "test" directory on your
A57 device?

   PIXMAN_DISABLE="" ./scaling-bench > cortex-a57-neon-single-pass.txt
   PIXMAN_DISABLE="wholeops" ./scaling-bench > cortex-a57-neon-separable.txt

This information can be used to see whether the Cortex-A57 fits a
common pattern observed with other ARM processors:

   https://people.freedesktop.org/~siamashka/files/20160405-arm-bilinear/

I suspect that it will show results similar to Cortex-A15, but we will
never know until we try.

This can help to identify an optimal bilinear scaling strategy. And
also decide which parts of the existing 32-bit ARM assembly code are
worth converting to AArch64.

-- 
Best regards,
Siarhei Siamashka