[Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path

Wed Apr 6 12:16:57 PDT 2011

On Wed, Apr 6, 2011 at 9:34 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Tue, Apr 5, 2011 at 7:46 AM, Taekyun Kim <podain77 at gmail.com> wrote:
>> And pixman_composite_over_n_8888_0565_ca_process_pixblock_head in tail_head
>> block increases code size causing i-cache miss.
>> We can think of jumping to head and then return to next part of tail_head
>> block.
>> But it seems difficult to do that without breaking
>> generate_composite_function macro template.
>
> The whole point of the use of software pipelining here is being able
> to overlap the last part of previous iteration with the beginning of
> current iteration. Jumping to "head" does not make much sense because
> the instructions from "head" and "tail" can be reordered quite wildly,
> diffusing into each other, so there is no clear border anymore.
>
> And yes, unfortunately pipelining doubles the size of code. Same as
> unrolling. It may be possible to add a way to disable pipelining for
> some of the fast paths via some new special flag. So that the code
> size can be reduced for the fast paths where pipelining provides too
> little or no performance gain.

Hmm, I just realized that you probably suggested using simple function
calls to the shared parts of code, right? It could be beneficial in
the cases like this, but requires to always have lr register free for
such purpose, and we are short on registers in some fast paths.
Hopefully i-cache misses are not a big problem in pixman NEON
optimized fast paths. Or do you have some statistics proving that they
are?

-- 
Best regards,
Siarhei Siamashka