[pulseaudio-discuss] GSoC: Call for project ideas

Mon Mar 25 06:23:19 PDT 2013

On Sat, 2013-03-23 at 11:11 +0000, Toby Smithe wrote:
> Hi,
> 
> I have a fairly free summer coming up, and thought it would be nice to
> participate in GSoC. For a while I've been interested in PulseAudio, and
> I have an idea for a project. I wonder if you might say whether you
> think it plausible.
> 
> I use PulseAudio's native protocol streaming quite a lot, and I've
> noticed that it seems quite rudimentary. I read the code a couple of
> releases back, and it seems just to stream uncompressed PCM over
> TCP. With a wireless connection and multi-channel audio, this quickly
> becomes impractical, with drops and latency problems. A while ago, I
> looked into implementing Opus compression for the network streams, but
> never had a chance. I think Opus would make the ideal codec because it
> is very flexible, recently ratified as an Internet standard, and can be
> remarkably lightweight (according to the official benchmarks).
> 
> In doing these network audio, I might also be able to move on to
> auxiliary tasks like improving the GUI tools for this use-case.
> 
> Do you think this might work?

Having means of doing non-PCM streaming would definitely be desirable.
That said, though, I'm a bit wary of introducing codec and RTP
dependencies in  PulseAudio (currently we have this at only one point,
which is the Bluetooth modules - something I don't see a way around).

Now there are two main concerns:

1. Codecs: choosing a codec is not simple. There are always potential
reasons to look at one vs. the other - CPU utilisation, bandwidth, codec
latency, quality, specific implementation (libav vs. reference
implementation vs. hardware acceleration) and so on.

2. RTP: as our usage gets more complicated, we're going to end up
implementing and maintaining a non-trivial RTP stack, which is actually
quite hard.

Deciding where to draw the line with regards to what does and does not
belong in PulseAudio is a bit tricky, but in my mind, encoding/decoding
should very much not be in PulseAudio because that beast inevitably gets
more complicated as you try to do more, and there are others solving the
problem system-wide.

RTP, I can see a case for it being in PulseAudio, but it is also
complicated, and as with codecs, there are other places in the system
where it gets more attention and maintenance.

The simplest idea I can think of to deal with this meaningfully is to
wrap a sink/source around a GStreamer pipeline to offload all that work
that we don't want to duplicate in PulseAudio .

On the sink side, we'd hook up to an appsrc to feed PCM data to a
pipeline. The pipeline would take care of encoding, RTP packetisation
and possibly a corresponding RTCP stream. This would allow codec
selection to be flexible, and in the distant future, could even support
taking encoded data directly.

On the source side, we'd hook up to an appsink, receiving PCM data from
the pipeline. The pipeline would take care decoding whatever the format
is, take care of RTCP and maybe more advanced features such as a jitter
buffer and packet-loss concealment (all of this can be plugged in or
not, depending on configuration).

Doing it this way means you're using a better RTP stack that gets
attention from a number of other use cases (plus assorted related
goodies) and support for multiple codecs.

Thoughts?

-- Arun