[gst-devel] RFC: gstreamer sdpbin for audio/video telephony

Fri Oct 14 03:47:49 CEST 2005

Hello all,

this is a bit of long request for comments, but please bear with me. I've 
been working with the Farsight project for a while to add SIP to the list 
of supported protocols (using the Sofia-SIP stack I'm also working with). 
The idea would be to make gstreamer the preferred media subsystem for 
VoIP, video-conferencing, and other similar applications. This has the 
nice side-effect that gstreamer would also become the preferred framework 
to implement/port new audio/video codecs, packetizers and network 
transports. Now I guess we'd all be happy to see this happen! :)

We are currently facing two big challenges: 1) adding missing RTP features 
like RTCP, etc; and 2) mapping SIP signaling/state-changes, expressed with 
SDP, to gst elements. Now as many of you know, Philippe Khalaf has been 
working on (1) for a while already, and some of the stuff has already 
found its way to gstreamer. Now I'm now working on the second issue at the 
moment.

I'm mainly interested in getting architectural comments to this proposal - 
IOW, is what I'm proposing here a sane way to use gstreamer, and are there 
any clear mistakes/misunderstandings. Comments about the design details 
are of course also welcome, but perhaps off-topic for this list.

And a small disclaimer/warning: I'm still a newbie in the world of 
gstreamer (although the Farsight people have tried hard to educate
me on the topic ;)), so beware of possible stupid mistakes...

The idea: gstsdpbin
-------------------

To create a new custom gst bin, in the spirit of 
gst-plugins-base-cvs/gst/playback/gstdecodebin.c, which would create and 
maintain the required gst elements, based on the SDP inputs received from 
the application (.. and more precisely from SIP (or perhaps also RTSP, 
Jabber, etc) signaling).

Key questions
-------------

1. Is having multiple independent pipelines (a send/receive pair for
    each media) in the same bin, an ok design?

Philippe's rtpbin already does this (rtp receiving and sending are linked,
to be able to support RTCP), so I guess there are no fundamental problems.

Having multiple media (let's say audio and video) managed by the same bin, 
is also something I'm not completely sure about. But there are arguments 
speaking for it: to realize lip-sync, the audio and video streams must be 
managed by a common entity. The gst bin would be the natural place to do 
this as it has the necessary timing information, and access to the jitter 
buffers which are used to fine-tune the sync.

2. Is gstsdpbin a useful concept?

... IOW, should we keep this in Farsight, or could this be of 
interest to a wider audience (possibly integrated to gstreamer)?

The main selling points are to concentrate the following functionality 
into one place:
    - ability to map between SDP and gst elements (using
      gst-plugins-good-cvs/gst/rtp/README ) => describe available
      gst elements as SDP, and build a set of gst elements based on
      SDP description
    - ability to handle on-the-fly updates (see below)
    - handling "intra-pipeline" dependencies: RTCP, lip-sync, etc

The architecture via an example
-------------------------------

If you are unfamiliar with how SIP works, please check out

  "SIP Basic Call Flow Examples"
  http://www.faqs.org/rfcs/rfc3665.html

.... first, and especially the section 3 on session establisment.

I'll use the audio+video call case as an example here. The application 
will provide two properties, the local and remote SDP to the gstsdpbin.
The local SDP describes what ports you are listening on, which codecs
you support, codec parameters, and how rtp payload-types are mapped to 
specific codecs. The remote SDP describes the same for the remote 
participant(s).