[pulseaudio-discuss] [PATCH v2 1/6] core: add ARM NEON optimized mono-to-stereo/stereo-to-mono remapping code
Peter Meerwald
pmeerw at pmeerw.net
Fri Jul 6 05:08:34 PDT 2012
> > - performance degradation on Cortex-A9 / pandaboard for remap: NEON is
> > fast on Cortex-A8 but slow on A9; need to distinguish
> Does it really degrade? Compared to C code? That seems surprising.
the problem is just one particular, very simple workload: mono_to_stereo
remapping of floats; basically, you get wxyz and output wwxxyyzz (w..z are
audio samples stored as float)
on A8 I suggest the following (for 4 samples):
vld1.32 {q0}, [%[src]]!
vmov q1, q0
vst2.32 {q0,q1}, [%[dst]]!
on A9 I suggest the following (for 2 samples):
ldm %[src]!, {r4,r6}
mov r5, r4
mov r7, r6
stm %[dst]!, {r4-r7}
the compiler generates something like (or 1 sample), which is pretty close
to the A9 code above performance-wise (but sucks on A8)
ldr r3, [%[src]]!
str r3, [%[dst], #0]
str r3, [%[dst], #4]
all other NEON optimizations are better than plain C code (compiled with
gcc 4.6.3), even on A9
I will provide microbenchmarks on A8/A9 when submitting the patches
> I read (on android-ndk) that the speedup through NEON is a lot smaller on A9
> (60% vs 10% in one scenario), but it's still a speedup.
> This is a part of that conversation:
thank you for the pointer; those are general statements I trend to agree
with
regards, p.
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list