[pulseaudio-discuss] [PATCH 1/6 v3] core: Initialize ARM NEON code if available
Peter Meerwald
p.meerwald at bct-electronic.com
Wed Oct 17 06:56:31 PDT 2012
Hello,
> Surprise! I'm reviewing this now. :p
indeed :)
> 1. v3 drops intrinsics in favour of inline asm -- is that for
> performance reasons?
I noticed performance issues with certain compiler versions; inline asm
offers more control/defined output; further, alignment annotations are not
available with intrinsics -- currently they are not used because I'm not
sure about the alignment guarantees of certain PA buffers; intrinsics could
probably be added later if there is enough interest
> 2. In the mono->stereo float case, the Cortex A9 code is actually
> slower. I recall that in a previous thread, we had this sort of
> situation on one of Panda/Beagleboard. Do we need some way to pick and
> choose implementations?
I only have beagleboard-xm and pandabaord available as test platforms
(Cortax A8 and A9, resp.)
PATCH 2/6 now tests for A8 vs A9/A15/Axxx and chooses code accordingly
another issue is benchmarking: relative performance is different depending
on the length of the buffers processed, whether they are cached
my target task involves stereo recording, resampling, int/float
conversion, stereo-to-mono and mono-to-stereo mapping and I am seeing good
speedups on both beagle- and pandaboard
I need to check the downmix to mono behaviour after
ff4af902cf4ac07c5f1da3b6dacbb3195c7c222d
resampler: Fix volume on downmix to mono
> 3. How shall we go about enabling this code? Have a configure time check
> for some instructions that are needed, build it in if available, and
> then run-time detection should pick the right code path?
I'd suggest to model after bluetooth/sbc: compile the *_neon.c files
always but only activate the NEON code if defined(__ARM_NEON__)
disadvantage is that we cannot have a common executable for NEON/non-NEON
ARM CPUs -- I don't think this is a big constraint
Remi Denis-Courmont suggests to use .s assembler files to overcome this
issue; this would necessitate some configure options as well
interestingly, on x86/AMD64 gcc can emit MMX/SSE code in inline asm even
when the compiler itself is not enabled to generate such instructions --
hence no .s files in PA so far
at runtime there already is an env. var PULSE_NO_SIMD to disable optimized
code path; further the output of /proc/cpuinfo is parsed to see if NEON is
available (kind of pointless since it is a compile-time decision)
> I'll take a closer look at things, run some tests, and start pushing
> this work. I'll also be moving all the test code to src/tests/cpu-test.c
> where the x86 tests have been consolidated, so running tests on
> different boards should become a lot less painful.
thank you for the effort; let me know if there are questions!
tests are not straightforward in some cases as the actual implementation
is not exported
orc is broken on NEON, the loadpq is not supported
thanks, p.
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list