[pulseaudio-discuss] Re : Effects of Clock Resolution on Pulseaudio

Wed Jul 30 13:27:13 PDT 2008

On Wed, 30.07.08 14:54, keith preston (keithpre at gmail.com) wrote:

> >>>> So i am not sure what part of Pulseaudio is causing high CPU Utilization
> >>>> ..
> 
> I can tell you that the fail points are mixing, software volume and resampling.
> 
> 
> > Hmm, that function is not optimized in any way, but if I look on its
> > sources doesn't appear that slow to me either. For each sample we do
> > one multiplication, one shifting, we appy saturation and then we
> > increase/decrease poinetrs with wrap around. That shouldn't be that
> > bad. Also, this code goes once linearly through all samples, which should
> > minimize influence of the cache.
> 
> There is also an array lookup of the channel volume (every for loop
> cycle), and two increment variables.   With an ARM processor this is
> probably enough extra variables to go past the number of registers and
> cause stack manipulation.   The easiest things would to be to process
> one channel at a time, incrementing your pointer properly and using
> the end of the array pointer as a stop point instead of keeping two
> count variables.  I also hope you have optimizations turned on in your
> compiler or you will get a divide instead of a shift.

Hmm, going through the memory block more than once, wouldn't that be
bad for the cache?

The volume array is just two entries in most cases
(i.e. stereo). Shouldn't be that bad.

> It definately is possible to run pulseaudio efficently on an ARM
> processor.   Take a look at this for example:
> http://developer.garmin.com/linux/nuvi-8xx-series/
> 
> I've been working on a modified version of pa_mix for my particular
> arm that should be faster.   It basically only works for S16 bits
> samples and doesn't do 2 channel volume, but here is a little of it.
> You need to modify pa_render to ignore the streams = 1 case and always
> use pa_mix, then this is your pa_mix function

I'd be very happy to take patches like this if they are clean and not
too invasive. Having special code for certain channel setups is
absolutely fine for me.

Please make sure to submit patches like this one upstream!

What surprises me however is that your mixing is still fast even if
you iterate through your buffers all at the same time. I have not
spent much time doing optimization work, however, I'd assume that
improving locality of the data (i.e. by not looking at all mix buffers
at the same time, but just two) would bring the best speed benefits.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net         ICQ# 11060553
http://0pointer.net/lennart/           GnuPG 0x1A015CC4