[pulseaudio-discuss] [PATCH] core: Fix a litte-endian bug in ARM svolume code
Peter Meerwald
pmeerw at pmeerw.net
Tue Oct 23 06:48:23 PDT 2012
Hello myself,
> comparing ARM vs. NEON code, the svolume s16 NEON code uses two MULs,
> while ARM can do with one -- the ARM instructions (smulwb, ssat) look
> ideal for the svolume_s16 code
for the records, NEON can also do it with one MUL:
static inline void vol_s16_neon(const uint32x4_t *vol4, int16_t *samples, unsigned length) {
asm volatile (
"mov %[length], %[length], lsr #2\n\t"
"vld1.s32 {q1}, [%[vol]]\n\t"
"1:\n\t"
"vld1.16 {d0}, [%[samples]]\n\t"
"vshll.s16 q0, d0, #15\n\t"
"vqdmulhq.s32 q0, q0, q1\n\t"
"vmovn.s32 d0, q0\n\t"
"subs %[length], %[length], #1\n\t"
"vst1.16 {d0}, [%[samples]]!\n\t"
"bgt 1b\n\t"
/* output operands (or input operands that get modified) */
: [samples] "+r" (samples), [length] "+r" (length)
: [vol] "r" (vol4) /* input operands */
: "memory", "cc", "q0", "q1" /* clobber list */
);
}
Checking ARM NEON svolume
func: 1291289 usec (min = 12817, max = 13184, stddev = 65.9113).
orig: 2438875 usec (min = 24322, max = 25605, stddev = 130.359).
Orc not supported. Skipping
100%: Checks: 3, Failures: 0, Errors: 0
this is a bit better than the previous NEON code (~1300000 vs. ~1510000),
but still slower than ARM (~920000)
regards, p.
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list