[gst-devel] pulsesink optimizations

Wed Oct 14 21:44:33 CEST 2009

Hi folks,
I noticed performance issues due to the rewrite of pulsesink since the
0.10.15 release. The degradation is in the 30% range on my Atom board
when playing MP3/AAC. There have been a couple of modifications in git
related to buffer attributes and latency settings, but overall the
overhead remains, and the pulsesink code could be further optimized
for low-power playback apps that don't care about latency.

I finally took the time to look at the code and check what was going
on. It seems that the overhead is mainly due to the granularity of
transfers between pulsesink and PulseAudio. What happens is that the
sink waits for space available in the PulseAudio buffer. When PA
requests data in a callback, the mainloop unblocks and the sink writes
its PCM to PulseAudio. The problem is that the sink will not try to
fill the whole buffer before handing-off the data to PulseAudio. For
example, say PulseAudio requests 100k (as defined by minreq) and you
are doing MP3 decode, you are going to send one frame (4608 bytes) at
a time to PulseAudio until the 100k have been filled. That's a lot of
overhead. It would be a lot more efficient power-wise to decode and
store as many frames as possible into the PA buffer before calling
pa_stream_write().

I have snippets of code as a proof of concept. I don't mind releasing
the code, but I must admit this is a hack and does not cover all the
cases pulsesink addresses. An additional optimization could consist in
passing the PulseAudio buffer upstream to avoid memory copies. The new
PA release provides support for this with pa_stream_begin_write(). In
short, I would badly need a review from more experienced developers...
If anyone is interested let me know.

Cheers,
- Pierre