[pulseaudio-discuss] [PATCH v2 1/6] core: add ARM NEON optimized mono-to-stereo/stereo-to-mono remapping code

Peter Meerwald pmeerw at pmeerw.net
Fri Jul 6 05:08:34 PDT 2012


> > - performance degradation on Cortex-A9 / pandaboard for remap: NEON is
> > fast on Cortex-A8 but slow on A9; need to distinguish

> Does it really degrade? Compared to C code? That seems surprising.

the problem is just one particular, very simple workload: mono_to_stereo 
remapping of floats; basically, you get wxyz and output wwxxyyzz (w..z are 
audio samples stored as float)

on A8 I suggest the following (for 4 samples):
vld1.32    {q0}, [%[src]]!
vmov       q1, q0
vst2.32    {q0,q1}, [%[dst]]!

on A9 I suggest the following (for 2 samples):
ldm        %[src]!, {r4,r6}
mov        r5, r4
mov        r7, r6
stm        %[dst]!, {r4-r7}

the compiler generates something like (or 1 sample), which is pretty close 
to the A9 code above performance-wise (but sucks on A8)
ldr	   r3, [%[src]]!
str        r3, [%[dst], #0]
str        r3, [%[dst], #4]

all other NEON optimizations are better than plain C code (compiled with 
gcc 4.6.3), even on A9

I will provide microbenchmarks on A8/A9 when submitting the patches

> I read (on android-ndk) that the speedup through NEON is a lot smaller on A9
> (60% vs 10% in one scenario), but it's still a speedup.
> This is a part of that conversation:

thank you for the pointer; those are general statements I trend to agree 
with

regards, p.

-- 

Peter Meerwald
+43-664-2444418 (mobile)


More information about the pulseaudio-discuss mailing list