[pulseaudio-discuss] GSoC: Call for project ideas

Mon Mar 25 12:24:47 PDT 2013

On Mon, 2013-03-25 at 16:19 +0000, Toby St Clere Smithe wrote:
> Hi Arun,
> 
> Thanks for your message. I'm going to reply to you first, and then reply
> to David.
> 
> Arun Raghavan <arun.raghavan at collabora.co.uk> writes:
> > Having means of doing non-PCM streaming would definitely be desirable.
> > That said, though, I'm a bit wary of introducing codec and RTP
> > dependencies in  PulseAudio (currently we have this at only one point,
> > which is the Bluetooth modules - something I don't see a way around).
> >
> > Now there are two main concerns:
> >
> > 1. Codecs: choosing a codec is not simple. There are always potential
> > reasons to look at one vs. the other - CPU utilisation, bandwidth, codec
> > latency, quality, specific implementation (libav vs. reference
> > implementation vs. hardware acceleration) and so on.
> 
> Indeed. I think a codec-agnostic implementation is important, as I
> mentioned in an earlier message. I chose Opus for the same reasons as it
> was designed for, and I appreciate the desire for some lossless codec
> too. But I hadn't spent much time thinking about implementation
> differences, mainly because at the time I did my research, there was
> only one implementation!
> 
> Your ideas about using GStreamer are interesting, but I don't know much
> about how a GStreamer pipeline would fit into PulseAudio's pipeline, and
> how this would affect things like latency. Nonetheless, you're certainly
> right about the maintenance burden, and I am a fan of GStreamer in any
> case. To have its power in PulseAudio would be very interesting, and
> certainly worth the research I would need to do to understand (say) the
> effect on latency.

GStreamer itself shouldn't add much in the way of latency. There are two
main points where latency would be added - the encoder on the sender
side, and the RTP jitter buffer on the receiver side. There might be
other smaller sources, but those would be the big ones.

I don't know what FLAC encoder latency is. Opus should be able to go
really low.

For the RTP jitter buffer, that latency is adjustable (from 0 to
whatever you choose). There's a tradeoff between the ability to handle
network jitter and latency. Should be easy enough to pick a low-ish
default and let users adjust based on network quality.

> > 2. RTP: as our usage gets more complicated, we're going to end up
> > implementing and maintaining a non-trivial RTP stack, which is actually
> > quite hard.
> >
> > Deciding where to draw the line with regards to what does and does not
> > belong in PulseAudio is a bit tricky, but in my mind, encoding/decoding
> > should very much not be in PulseAudio because that beast inevitably gets
> > more complicated as you try to do more, and there are others solving the
> > problem system-wide.
> >
> > RTP, I can see a case for it being in PulseAudio, but it is also
> > complicated, and as with codecs, there are other places in the system
> > where it gets more attention and maintenance.
> 
> David asked in his other messages about whether my plans were for RTP or
> for the native protocol. I'll describe my thinking here, too.
> 
> Originally, I had looked at writing code for implementing these codecs
> in the native protocol, because I felt that I wanted to build the
> support as closely to the core of PulseAudio as possible; doing so
> seemed to be the easiest way to keep on top of the logic for adjusting
> bitrate etc according to latency and frame drops, and I also felt that
> doing so in the proprietary protocol would give us more coding liberty.
> 
> Now, however, I am largely in agreement that it is probably foolish to
> maintain this whole stack in PulseAudio, and furthermore, it's probably
> good to use the streaming protocols that are widely known and
> understood, rather than duplicating the effort in a way that is likely
> to be suboptimal.
> 
> Do you agree that using RTP rather than building the GStreamer pipeline
> closer to the core of PulseAudio is probably the best plan?

I'm not sure I understand what you mean here. Could you clarify what you
mean by "building the GStreamer pipeline closer to the core"?

> > The simplest idea I can think of to deal with this meaningfully is to
> > wrap a sink/source around a GStreamer pipeline to offload all that work
> > that we don't want to duplicate in PulseAudio .
> 
> Quite.
> 
> > On the sink side, we'd hook up to an appsrc to feed PCM data to a
> > pipeline. The pipeline would take care of encoding, RTP packetisation
> > and possibly a corresponding RTCP stream. This would allow codec
> > selection to be flexible, and in the distant future, could even support
> > taking encoded data directly.
> >
> > On the source side, we'd hook up to an appsink, receiving PCM data from
> > the pipeline. The pipeline would take care decoding whatever the format
> > is, take care of RTCP and maybe more advanced features such as a jitter
> > buffer and packet-loss concealment (all of this can be plugged in or
> > not, depending on configuration).
> 
> This does sound like a good plan to solve the given problem, save for my
> concerns about monitoring and integration above. My other thought, which

For monitoring and adapting to network conditions, you could use RTCP to
track bandwidth and packet loss statistics and use that for bit rate
adaptation (at least that's the theory ;)).

> would not work in the context of using an appsrc/appsink, is that having
> GStreamer built more closely into PulseAudio would also mean we could
> offload things like resampling (which I noticed was discussed in another
> GSoC-related thread) to the GStreamer pipeline, rather than have to
> maintain those code-paths, too.

As I was mentioning in the other mail to Tanu, and inferring in my
previous post, where to draw the line between PulseAudio and GStreamer
(and other parts of the stack, really) is quite tricky. I don't think
using GStreamer for resampling and other processing fits with what we
want to do. There is some duplication of functionality, for certain, but
they are core to PulseAudio, and it makes sense to rely on individual
libraries than a whole multimedia framework for those.

> > Doing it this way means you're using a better RTP stack that gets
> > attention from a number of other use cases (plus assorted related
> > goodies) and support for multiple codecs.
> 
> Right, indeed. But I'm still not entirely sure about RTP vs native, and
> where the GStreamer code should go. My temptation is to couple
> PulseAudio as closely as possible to GStreamer. But I thought about this
> a little when doing my initial research, and came to the conclusion that
> if this was a good idea, it would have been done already. What am I
> missing?
> 
> Indeed, if we build GStreamer into pulsecore,  then we could use RTP or
> the native protocol, as we saw fit. If we went for the appsrc/sink
> solution, then we'd be less flexible.

I wouldn't look at using GStreamer in pulsecore. It wouldn't act as a
native protocol replacement. Instead, I visualise things working
something like this:

        -----
C      |     |
l      |     |      --------------
i ---> |  P  |     |              |
e ---> |  A  |---> | Gst RTP sink |  ... network ...
n ---> |     |     |              |
t      |     |      --------------
s      |     |
        -----
                                 -----
                                |     |
         ----------------       |     |       ----------------- 
        |                |      |  P  |      |                 |
   ...  | Gst RTP source | ---> |  A  | ---> | Client/loopback |
        |                |      |     |      |                 |
         ----------------       |     |       ---------------- 
                                |     |
                                 -----

The RTP sink would very roughly be a GStreamer pipeline like:

  appsrc ! opusenc ! rtpopuspay ! rtpbin

And the RTP source would very roughly be:

  rtpbin ! rtpopusdepay ! opusdec ! appsink

The app* elements would provide/consume PCM data to/from the PulseAudio
source/sink modules. The module could hook into the rtpbin to get RTCP
data/stats and poke at the encoder appropriately. The frequency and
precision of RTCP 

Cheers,
Arun