[pulseaudio-discuss] [PATCH v2 1/6] core: add ARM NEON optimized mono-to-stereo/stereo-to-mono remapping code

Peter Meerwald pmeerw at pmeerw.net
Sun Mar 11 11:12:36 PDT 2012


Hello Arun,

thank you for testing; you are covering areas I have not thought of :)

> > v2: 
> > * add ARM NEON stereo-to-mono remapping code
> > * static __attribute__ ((noinline)) is necessary to prevent inlining and 
> >   work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863
> > * call test code, the reference implementation is obtained using 
> >   pa_get_init_remap_func()
> > * remove check for NEON flags
> > v1: 
> > * ARM NEON mono-to-stereo remapping code

> A couple of issues here. Firstly, if I turn up the test loop count to
> 100000, I can fairly reliably see a bunch of failures like the one
> below.

I have not been able to reproduce the issue so far; my setup is beagle-xm 
with softfp, the particular compiler might also be different

you get different results for remap_stereo_to_mono, s16

are you testing the patches one-by-one? i.e. you have not yet applied 
'[PATCH 5/6] core: add stereo to mono special case remapping'?


I think there is an issue in run_test_s16_stereo_to_mono() when setting up 
the remap structure:
remap.format = &sf;
iss.format = PA_SAMPLE_S16NE;
iss.channels = 2;
oss.format = PA_SAMPLE_S16NE;
oss.channels = 1;
remap.i_ss = &iss;
remap.o_ss = &oss;
remap.map_table_f[0][0] = 1.0;
remap.map_table_f[0][1] = 1.0;
remap_init_func(&remap);

what is missing is the map table for int values:
remap.map_table_i[0][0] = 0x10000;
remap.map_table_i[0][1] = 0x10000;

in run_test_s16_mono_to_stereo() there is a similar issue, but here
init_remap_c() in remap.c will use the remap_mono_to_stereo_c instead of 
the generic mapper; remap_mono_to_stereo_c does not need the map_table

in '[PATCH 5/6] core: add stereo to mono special case remapping' a special 
case is added to init_remap_c() to cover the stereo-to-mono case which 
might mask the issue

... but I am just speculating here :(

could you try above initialization?

> Next, I see the reference implementation doing better in the
> mono-to-stereo float remapping.

the C code seems a lot more efficient on the panda, it is well known that 
floats are handled better

NEON remap_mono_to_stereo(float) is ~ 4x slower on panda, and only 
slightly faster on beagle

remap_mono_to_stereo(s16) is ~ 1.5x slower on panda, and somewhat faster 
on beagle

I'll try to build a hardfp system and see if the runtime can be improved 
(or at least avoid regressions)

regards, p.

-- 

Peter Meerwald
+43-664-2444418 (mobile)


More information about the pulseaudio-discuss mailing list