[pulseaudio-discuss] [PATCH 1/2] sconv: Change/fix conversion to/from float32
Peter Meerwald
pmeerw at pmeerw.net
Sun Feb 3 16:20:36 PST 2013
Hello Tanu,
> On Sun, 2013-01-13 at 20:59 +0200, Tanu Kaskinen wrote:
> > On Sun, 2013-01-13 at 14:53 +0100, Peter Meerwald wrote:
> > > > > diff --git a/src/pulsecore/sconv_neon.c b/src/pulsecore/sconv_neon.c
> > > > > index 6fd966d..111b56f 100644
> > > > > --- a/src/pulsecore/sconv_neon.c
> > > > > +++ b/src/pulsecore/sconv_neon.c
> > > > > @@ -36,16 +36,11 @@ static void pa_sconv_s16le_from_f32ne_neon(unsigned n, const float *src, int16_t
> > > > > "movs %[n], %[n], lsr #2 \n\t"
> > > > > "beq 2f \n\t"
> > > > >
> > > > > - "vdup.f32 q2, %[plusone] \n\t"
> > > > > - "vneg.f32 q3, q2 \n\t"
> > > > > - "vdup.f32 q4, %[scale] \n\t"
> > > > > - "vdup.u32 q5, %[mask] \n\t"
> > > > > + "vdup.f32 q1, %[scale] \n\t"
> > > > >
> > > > > "1: \n\t"
> > > > > "vld1.32 {q0}, [%[src]]! \n\t"
> > > > > - "vmin.f32 q0, q0, q2 \n\t" /* clamp */
> > > > > - "vmax.f32 q0, q0, q3 \n\t"
> > > > > - "vmul.f32 q0, q0, q4 \n\t" /* scale */
> > > > > + "vmul.f32 q0, q0, q1 \n\t" /* scale */
> > > > > "vcvt.s32.f32 q0, q0, #16 \n\t" /* narrow */
> > >
> > > > You removed clamping - what happens if there's need for clamping? (I'm
> > > > not very good at reading assembly.)
> > >
> > > vrshrn does the narrowing int32->int16 (with saturation); the comment
> > > should be moved one line down
> >
> > The vcvt instruction converts floating-point numbers to fixed-point
> > numbers, with 16 bits in the integer part and 16 bits in the fractional
> > part, so most of the interesting stuff happens already in vcvt. How does
> > vcvt handle the situation where the float doesn't fit in the 16 bits
> > that are reserved for the integer part? Saturation or SIGFPE, or
> > something else? How is NaN handled? The reference[1] that I'm using
> > doesn't say anything about this...
> >
> > You say that vrshrn does its thing with saturation. Since the integer
> > part of the fixed-point input is already 16-bits, there's not much need
> > for saturation. Only the rounding the fractional part can cause
> > overflow, so do you mean that if the rounding would cause overflow,
> > vrshrn uses truncation instead of rounding? (This is not specified in
> > the reference either.)
> >
> > [1] http://infocenter.arm.com/help/topic/com.arm.doc.dui0204j/CIHFFGJG.html
> You never answered these questions, and the new patch version contains
> the same code. "vcvt.s32.f32 q0, q0, #16" converts four floats into four
> 16.16 fixed-point numbers. What happens if the input is greater than
> INT16_MAX?
here is some more detail:
vcvt.s32.f32 q0, q0, #16
does saturation (this is indeed not documented), so we have 16 bit
integer and 16 bit fractional
the following
vrshrn.s32 d0, q0, #16
shifts 16 bits to the right and rounds according to the shifted-out
fractional part (but does NOT saturate); this is an error, the correct
instruction is
vqrshrn.s32 d0, q0, #16
which does saturation and rounding
I'll post a v3
the test code below converts several values:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#ifdef __arm__
#include "arm_neon.h"
#else
#include "xmmintrin.h"
#endif
# on ARM NEON
0.500 0 -- 00008000 1
-0.500 0 -- ffff8000 0
0.300 0 -- 00004ccc 0
0.600 1 -- 00009999 1
2.500 2 -- 00028000 3
3.500 4 -- 00038000 4
32000.500 32000 -- 7d008000 32001
33000.500 33000 -- 7fffffff 32767
-33000.500 -33000 -- 80000000 -32768
32767.500 32768 -- 7fff8000 32767
all values look reasonable; note that resuls are slightly different
compared to lrintf() or SSE due to different rounding:
NEON always rounds up on 0.5, lrintf() round toward the nearest even
integer -- so there is a maximum deviation of 1 in some rare cases
int main() {
float values[] = {0.5, -0.5, 0.3, 0.6, 2.5, 3.5, 32000.5, 33000.5, -33000.5, 32767.5};
int i;
for (i = 0; i < sizeof(values)/sizeof(float); i++) {
float f = values[i];
printf("%.3f %ld -- ", f, lrintf(f));
#ifdef __arm__
float32x4_t x = vdupq_n_f32(f);
int32x4_t y = vcvtq_n_s32_f32(x, 16);
int16x4_t z = vqrshrn_n_s32(y, 16);
printf("%08x %d\n",
vgetq_lane_s32(y, 0),
vget_lane_s16(z, 0));
#else
__m128 x = _mm_set_ss(f);
printf("%d\n", _mm_cvt_ss2si(x));
#endif
}
return EXIT_SUCCESS;
}
thanks, regards, p.
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list