[gst-devel] pulsesink optimizations
René Stadler
mail at renestadler.de
Thu Oct 15 02:18:13 CEST 2009
pl bossart wrote:
> Howdy Rene',
>
>> Wim just committed my patch that changes pulsesink back to set the minreq to
>> the value of the latency-time property, which lets applications tune the
>> gst<->pa overhead again.
>
> Humm, my experiments show that the core activity increases when minreq
> is > 64k. I sort of remember Lennart mentioning that this was the size
> of the block allocated in PA, and beyond this you would use malloc().
Note that minreq is just the threshold when pulse will ask for more data. You
are free to send whatever amount is writable when you have data ready, it can
be smaller or larger than minreq (pulsesink does exactly that).
I don't know how malloc comes into play here. I just know that it makes
technically no sense to write buffers larger than 64K to pulse: The client
library chops them down to 64K chunks because that is the internal size limit.
That is, the IPC overhead of sending two 64K vs one 128K buffer is exactly the
same.
> Besides, it seems to me that the total latency is really defined by
> tlength, if you increase minreq the size of the server buffer will be
> adjusted. See Lennart's page at
> http://pulseaudio.org/wiki/LatencyControl, latency is defined with
> tlength, minreq has no direct impact on latency.
> And as I mentioned it, the patch doesn't change the overhead since we
> keep writing the same size no matter what minreq was set to.
Yes indeed, in fact the patch gives next to no CPU load improvement. However,
it leads to the writes from gst to pa being grouped together with larger
intervals of inactivity in between (tunable with the latency-time property).
This grouping together results in improved power management. In the N900 I
measured a penalty of 10% in energy consumption without the patch applied (for
MP3 on wired headset, display off, i.e. typical long term playback use-case).
>> During the investigation of that regression, I found that there is some further
>> things to optimize in pulsesink. I will be filing more bugs and sending more
>> patches as I come up with better solutions.
>
> Will send you my code.
>
>> For the time being, I think you can get almost the same performance/battery
>> life gain by increasing the output buffer size of your audio decoders. Felipe
>> Contreras has been trying this with the vorbis decoder, with good results.
>
> That's not necessarily an option. There are 3rd party decoders out
> there whose code is not necessarily public. And fixing the decoders is
> somewhat odd when the real problem is the sink...
> Cheers
> -Pierre
The sink is not perfect, but the decoder situation also need work. Current
decoders chose the output buffer sizes themselves, and this is wrong. Yes you
could change the sink and stitch these buffers together using pad_alloc, but
the fact remains that the decoder picks the size and therefore decides on the
overhead up to the sink (and all processing elements between decoder and sink).
This became apparent to me when Felipe profiled OggVorbis playback with a
highly optimized decoder (ffmpeg). Basically the CPU spends an insane amount of
time pushing GStreamer buffers around compared the actual audio decoding. And
this on the N900, which shows exactly that the current situation is complete
nonsense for a battery-powered device.
--
Regards,
René Stadler
More information about the gstreamer-devel
mailing list