[Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path
podain77 at gmail.com
Wed Apr 6 23:39:16 PDT 2011
2011/4/7 Siarhei Siamashka <siarhei.siamashka at gmail.com>
> Hmm, I just realized that you probably suggested using simple function
> calls to the shared parts of code, right? It could be beneficial in
> the cases like this, but requires to always have lr register free for
> such purpose, and we are short on registers in some fast paths.
> Hopefully i-cache misses are not a big problem in pixman NEON
> optimized fast paths. Or do you have some statistics proving that they
I just wanted to point that macro expension of basic building blocks can be
a potential problem causing larger code size.
As you mentioned, we can think of using lr register or some dynamic runtime
In practice, i-cache miss (including i-TLB miss) is not that a big problem
especially for these kind of code blocks in loop.
I'm a bit gone too much in that point.
What can be the best approach to maximizing code utilization in binary level
(not source level)??
It does not seem that we have better choice than dynamic code fetching.
And is it really worth doing that?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pixman