[pulseaudio-discuss] [PATCH 2/2] core: Add ARM NEON optimized sample conversion code
Peter Meerwald
pmeerw at pmeerw.net
Thu Oct 25 02:19:17 PDT 2012
Hello Arun,
> I was poking around this a bit. An input of 0x3f4aaa95 after the
> multiplication with 32767.0 should result in 0x46caa8ff but tuns out to
> be 0x46caa900. Still trying to figure out why.
I cannot follow your example, it always results in 0x46caa900 (using NEON
or not)
but I think a have good explanation:
static void pa_sconv_s16le_to_float32ne(unsigned n, const int16_t *src, float *dst) {
pa_assert(src);
pa_assert(dst);
for (; n > 0; n--)
*(dst++) = ((float) (*(src++))) / (float) 0x7FFF;
}
is the baseline implementation; notice that we have a division here
the NEON code does the equivalent of
const float invscale = 1.0f / 0x7FFF;
for (; n > 0; n--)
*(dst++) = ((float) (*(src++))) * invscale;
notice that the division is replaced by multiplication with the inverse
also these two C implementation show different results; the NEON
implementation gives the exact results of the second C implementation
float division is prohibitive on NEON runtime-wise, hence the
multiplication with the inverse
I think a C compiler is not allowed to make such optimization (unless one
explicitly allows for precision loss)
regards, p.
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list