[Pixman] [PATCH] Add support for aarch64 neon optimization

Tue Apr 5 11:20:54 UTC 2016

> This code is not just there for prefetching. It is an example of
> using software pipelining:

OK. I understand.
But the code is very hard to maintain... I've met too many register
conflictions.
# q2 and d2 were used in a same sequence. It cannot be exist in aarch64-neon.

Anyway, I'll try to remove unnecessary register copies as you've suggested.
After that, I'll also tryh to make benchmarks that
* advance vs none
* L1 / L2 / L3 (Cortex-A53 doesn't have), keep / strm
to find the better configuration.

But it is only a result of Cortex-A53 (that you ane me have). Does anyone can
test other (expensive :) aarch64 environment ?
(Cortex-Axx, Apple Ax, NVidia Denver, etc, etc...)

On 5 April 2016 at 16:53, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> On Sun, 3 Apr 2016 20:17:45 +0900
> Mizuki Asakura <ed6e117f at gmail.com> wrote:
>
>> > The 'advanced' prefetch type is implemented by having some branchless ARM code
>>
>> If the prefetch code assumes that "branch-less", it cannot be done in aarch64
>> since aarch64 doesn't support conditional alythmetics such as subge, subges.
>>
>> If so, we could / should remove all prefetch-related codes because it
>> might cause
>> performance regression (by branching) rather than benefit of prefetching.
>
> Yes, I'm fine and actually in favour of removing the prefetch related
> AArch64 code (assuming that it does not do anything good for us).
> Something similar happened to the pixman x86 prefetch code in the past:
>     https://lists.freedesktop.org/archives/pixman/2010-June/000231.html
>
> But I'm going to run some additional higher level benchmarks to be
> sure.
>
>> And also, we could remove all "tail-head" optimizatoins that is only
>> for highly utilizing prefetching.
>
> This code is not just there for prefetching. It is an example of
> using software pipelining:
>     https://cgit.freedesktop.org/pixman/tree/pixman/pixman-arm-neon-asm.S?id=pixman-0.34.0#n191
>     https://en.wikipedia.org/wiki/Software_pipelining
>
>> "tail-head" codes are very complicated, hard to understand and hard to maintain.
>> If we could remove these codes, asm code could be more slimmer and
>> easy-to-maintain.
>
> If we were favouring ease of maintenance over performance, then we
> would have used intrinsics instead of assembly in the first place.
>
>> Ofcource, the modification shouldn't be applied for original
>> aarch32-neon codes. It may cause
>> performance regression on some architecture.
>> But for aarch64, it would be a considerable changes ?
>
> Well, just to make sure that there is no misunderstanding between
> us. I would like to keep and do AArch64 conversion for all the
> parts of code, which are well optimized and not planned to be
> replaced in the near future. And I suggested not to bother with
> the 'pixman-arm-neon-asm-bilinear.S' file, because this code
> is not the best way to do the job and it had to be eventually
> replaced with iterators:
>     https://lists.freedesktop.org/archives/pixman/2013-September/002889.html
>     https://lists.freedesktop.org/archives/pixman/2013-September/002892.html
>
> Now it looks like Ben Avison has NEON patches for doing
> separable bilinear scaling and, so this makes the
> 'pixman-arm-neon-asm-bilinear.S' file really obsolete.
>
> The nearest scaling and rgb565 format support code is still useful.
> The bilinear scaling code from pixman-arm-neon-asm.S is useful too, at
> least the 'pixman_scaled_bilinear_scanline_8888_8888_SRC_asm_neon' and
> 'pixman_scaled_bilinear_scanline_8888_0565_SRC_asm_neon' functions.
> But 'pixman_scaled_bilinear_scanline_0565_0565_SRC_asm_neon' and
> 'pixman_scaled_bilinear_scanline_0565_x888_SRC_asm_neon' are bad.
>
> Anyway, your first patch was already usable. I only see that just a
> few minor tweaks are needed and it will be good enough for pushing
> to git. But if I'm mistaken and something is actually difficult,
> then you don't need to spend too much time on it. Thanks.
>
> --
> Best regards,
> Siarhei Siamashka