[pulseaudio-discuss] A pulseaudio appliance

Fri Oct 12 22:15:28 PDT 2007

Hi Lennart,

On Sat, Oct 13, 2007 at 01:05:16AM +0200, Lennart Poettering wrote:
> > 1) Resampling.  PA uses libsamplerate, which uses a lot of
> >    floating-point math.  There was some discussion about a fixed-point
> >    resampling library a while back (I think it came from the Speex
> >    project), but I haven't seen anything about it in a while.
> 
> The code has been available in the "lennart" branch in SVN for a
> while. Just pass --resample-method=speex-float-0 or similar.

Cool, that's good to hear.

> > 2) Volume scaling.  PA converts its internal, dB-based volume level into
> >    a floating-point linear scale that is applied to each sample of a
> >    stream.  The code that does this for a single stream is reasonably
> >    sane (although I would argue that the internal volume representation
> >    should be the linear version instead of the dB version so that it
> >    would only need to be converted when the volume is set), but the code
> >    for scaling multiple streams is horrendously inefficient.  I did a
> >    quick rewrite of this code that took CPU utilization on a P4 class PC
> >    from over 30% to under 10% when playing multiple streams (I don't
> >    remember exactly how many, probably 2-3).  I never cleaned up the
> >    code enough to bother with submitting a patch though.
> 
> Hmm, the current code translates the volume spec first, and only
> after that runs the inner loop. Should be OK. It's not liboil
> accelerated, but should be good enough.
> 
> http://pulseaudio.org/browser/branches/lennart/src/pulsecore/sample-util.c#L344

Yes, this code looks fairly good at first glance.  But I was actually
referring to another section of code.  It's the volume scaling in
pa_mix() for per-stream volumes that I found to be extremely slow.

I checked out the lennart branch from svn, and it doesn't look much
different from what I remember.  The outer loop iterates over samples,
and the inner loop iterates over streams.  Inside the inner loop is a
call to pa_sw_volume_to_linear(), which ends up being called for every
sample of every stream whose volume is not PA_VOLUME_MUTED or
PA_VOLUME_NORM (which admittedly seems like the most likely case).
The outer loop also makes a call to this function for each sample of
the output (again, if the volume is not muted or 0 dB).  This amounts
to a lot of repeated calculations, inefficient on a typical desktop PC
but downright crippling when floating-point calculations are being
done in software.  And then the double returned from the conversion is
used in a multiplication in both loops.

Changing the code to do the sw-to-linear volume conversions before the
loops helps a lot (this is how I achieved significant performance
gains on my desktop PC, but I only implemented it for S16NE, and that
wasn't even a good implementation).  But to me it would make much more
sense to simply make the internal volume type store the volume in linear
format, then the conversion would only have to be done once for each
time a given volume was changed.

> > It's been at least a year since I looked at this code though, so maybe
> > it has been improved since then.  But as long as it's all still based
> > on floating point you're going to have a tough time using it on a
> > system without an FPU. 
> 
> I think most issues for non FPU hw are fixed now. But I wouldn't want
> to guarantee that all inner loops are now integer-only in all cases.

Nope, not all inner loops, but it looks like there have been big steps
in this direction in the past year or so.  If I get some free time
though I might take a look at making some improvements to pa_mix().  I
don't expect that I'll have time for something like that for a little
while though.

Cheers,
Seth