[pulseaudio-discuss] reproducing PA performance results
Peter Meerwald
pmeerw at pmeerw.net
Thu Apr 5 01:51:41 PDT 2012
Hello,
looking at Arun's PA vs. AudioFlinger comparison [1], I'm wondering how to
test PA performance in a reproducible and reliable way
my scope of interest is assessing the ARM NEON patches I submitted [3]; so
far, there are some micro-benchmarks comparing the different
implementations (plain C vs. C with NEON intrinsics), but they don't tell
their impact on the whole system's performance
Arun is testing on OMAP4460, my platform is OMAP3730
I'll start with some questions on the test procedure:
clock has been reduced to 350MHz in Arun's tests (presumably to make the
differences more measurable) -- how is the clock reduced?
Arun measures with top; I observe top output to fluctuate widely -- how do
you read the output, average the results? how is top started (eg. top -d
1)?
was the audio data stereo or mono? what does the hardware support?
what tool was used for playback? (Arun mentions async API but not more
info); what is wrong with pacat and specifying the particular options used?
how is the Speex resampler used? float or fixed? quality?
how is the PA daemon configured?
realtime? priorities? shm yes/no?
I am not happy with this procedure of testing; anyway, here are some
results comparing 44.1 and 48 kHz stereo playback on OMAP3730
(beagleboard-xm @ 900MHz via mpurate kernel parameter); pulseaudio
(1.99.1) is started with --system and compiled with gcc 4.6.3 and -O2
-march=armv7-a -ffast-math -fPIC -mfloat-abi=softfp -mfpu=neon
I am forcing PA to default-sample-rate = 48000 and alternate-sample-rate =
48000 (PA fails after idle with alternate-sample-rate=41000)
Speex is patched with [2]
48KHz stereo playback takes < 1% CPU
this is just PA/ALSA overhead
44KHz stereo playback takes ~ 3% CPU (Speex float-3 resampler w/NEON, PA
with NEON)
here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion
44KHz stereo playback takes ~ 5% CPU (Speex float-3 resampler w/NEON, PA
without NEON)
here we have 44KHz->48KHz resampling, and sint16->float32 / float32->sint16 conversion
NEON optimization of the sample format conversion pays off
the Speex fixed-3 resampler makes more sense and is probably a bit more
efficient; it saves the sint16->float32 / float32->sint16 conversion
I am measuaring
pacat 48KHz.wav vs. pacat 44KHz.wav
observations:
I am seeing memory and CPU consumption to slightly increase (in top) when
playing a stream -- need to investigate further
does shm make a difference?
does --readtime or priorities make a difference?
is fixed or float Speex NEON resampler faster? hard to tell...
latency vs. CPU?
how to get better performance reading?
profiling is not so easy to set up and operate
Arun reported that some of the NEON code might actually be slower on
OMAP4 and/or using hardfp, let's see
conclusions:
resampling audio is not free (consistent with Arun's result)
observing top output involves too much guessing
regards, p.
[1] http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/
[2] https://blueprints.launchpad.net/linaro-multimedia-speex/+spec/linaro-mmwg-speex-neon-update
[3] http://permalink.gmane.org/gmane.comp.audio.pulseaudio.general/12574
--
Peter Meerwald
+43-664-2444418 (mobile)
More information about the pulseaudio-discuss
mailing list