[gst-devel] pulsesink optimizations

Thu Oct 15 02:18:13 CEST 2009

pl bossart wrote:
> Howdy Rene',
> 
>> Wim just committed my patch that changes pulsesink back to set the minreq to
>> the value of the latency-time property, which lets applications tune the
>> gst<->pa overhead again.
> 
> Humm, my experiments show that the core activity increases when minreq
> is > 64k. I sort of remember Lennart mentioning that this was the size
> of the block allocated in PA, and beyond this you would use malloc().

Note that minreq is just the threshold when pulse will ask for more data. You 
are free to send whatever amount is writable when you have data ready, it can 
be smaller or larger than minreq (pulsesink does exactly that).

I don't know how malloc comes into play here. I just know that it makes 
technically no sense to write buffers larger than 64K to pulse: The client 
library chops them down to 64K chunks because that is the internal size limit. 
That is, the IPC overhead of sending two 64K vs one 128K buffer is exactly the 
same.

> Besides, it seems to me that the total latency is really defined by
> tlength, if you increase minreq the size of the server buffer will be
> adjusted. See Lennart's page at
> http://pulseaudio.org/wiki/LatencyControl, latency is defined with
> tlength, minreq has no direct impact on latency.
> And as I mentioned it, the patch doesn't change the overhead since we
> keep writing the same size no matter what minreq was set to.

Yes indeed, in fact the patch gives next to no CPU load improvement. However, 
it leads to the writes from gst to pa being grouped together with larger 
intervals of inactivity in between (tunable with the latency-time property). 
This grouping together results in improved power management. In the N900 I 
measured a penalty of 10% in energy consumption without the patch applied (for 
MP3 on wired headset, display off, i.e. typical long term playback use-case).

>> During the investigation of that regression, I found that there is some further
>> things to optimize in pulsesink. I will be filing more bugs and sending more
>> patches as I come up with better solutions.
> 
> Will send you my code.
> 
>> For the time being, I think you can get almost the same performance/battery
>> life gain by increasing the output buffer size of your audio decoders. Felipe
>> Contreras has been trying this with the vorbis decoder, with good results.
> 
> That's not necessarily an option. There are 3rd party decoders out
> there whose code is not necessarily public. And fixing the decoders is
> somewhat odd when the real problem is the sink...
> Cheers
> -Pierre

The sink is not perfect, but the decoder situation also need work. Current 
decoders chose the output buffer sizes themselves, and this is wrong. Yes you 
could change the sink and stitch these buffers together using pad_alloc, but 
the fact remains that the decoder picks the size and therefore decides on the 
overhead up to the sink (and all processing elements between decoder and sink).

This became apparent to me when Felipe profiled OggVorbis playback with a 
highly optimized decoder (ffmpeg). Basically the CPU spends an insane amount of 
time pushing GStreamer buffers around compared the actual audio decoding. And 
this on the N900, which shows exactly that the current situation is complete 
nonsense for a battery-powered device.

-- 
Regards,
   René Stadler