[pulseaudio-discuss] [PATCH] core: Fix a litte-endian bug in ARM svolume code
Arun Raghavan
arun.raghavan at collabora.co.uk
Tue Oct 23 07:00:37 PDT 2012
On Tue, 2012-10-23 at 15:48 +0200, Peter Meerwald wrote:
> Hello myself,
>
> > comparing ARM vs. NEON code, the svolume s16 NEON code uses two MULs,
> > while ARM can do with one -- the ARM instructions (smulwb, ssat) look
> > ideal for the svolume_s16 code
>
> for the records, NEON can also do it with one MUL:
>
> static inline void vol_s16_neon(const uint32x4_t *vol4, int16_t *samples, unsigned length) {
> asm volatile (
> "mov %[length], %[length], lsr #2\n\t"
> "vld1.s32 {q1}, [%[vol]]\n\t"
> "1:\n\t"
> "vld1.16 {d0}, [%[samples]]\n\t"
> "vshll.s16 q0, d0, #15\n\t"
> "vqdmulhq.s32 q0, q0, q1\n\t"
> "vmovn.s32 d0, q0\n\t"
> "subs %[length], %[length], #1\n\t"
> "vst1.16 {d0}, [%[samples]]!\n\t"
> "bgt 1b\n\t"
> /* output operands (or input operands that get modified) */
> : [samples] "+r" (samples), [length] "+r" (length)
> : [vol] "r" (vol4) /* input operands */
> : "memory", "cc", "q0", "q1" /* clobber list */
> );
> }
>
> Checking ARM NEON svolume
> func: 1291289 usec (min = 12817, max = 13184, stddev = 65.9113).
> orig: 2438875 usec (min = 24322, max = 25605, stddev = 130.359).
> Orc not supported. Skipping
> 100%: Checks: 3, Failures: 0, Errors: 0
>
> this is a bit better than the previous NEON code (~1300000 vs. ~1510000),
> but still slower than ARM (~920000)
Nice catch on the alignment. I'm trying to extend our tests to catch
these cases. A couple of notes: Rémi Denis-Courmont mentions that you
will likely see performance benefits in the NEON code by sprinkling in
some preloads (PLD). I've also factored out the sconv code and that does
provide a win on all the boards I tried.
To get this moving for 3.0, could you respin just the sconv patches on
top of master (I'll push out my testing code soon) so that we can push
that bit out first while we work on the others?
Cheers,
Arun
More information about the pulseaudio-discuss
mailing list