[gst-devel] rtp - mux or not to mux?

Wed Nov 2 10:13:19 CET 2005

> Hello all,
> 
> continuing the rtp-invasion of gstreamer-devel. :) Earlier I posted an 
> high-level proposal about my gstsdpbin idea:
> 
> http://article.gmane.org/gmane.comp.video.gstreamer.devel/13940
> 
> Now I'm working on the details of the demuxing/switching part, in other 
> words, how to support multiple codecs inside one stream of RTP packets 
> (RTP session). This should be useful for various uses of RTP in 
> gstreamer apps...
> 
> The problem is somewhat similar to mpeg/ogg demuxing, but with few key 
> differences:
> 
> - an RTP stream will only contain one logical stream;
>    the codec might change, but the packets will have different
>    timestamps and sequence numbers (different codecs do
>    not overlap timestamp wise, as happens with ogg/mpeg and
>    other container streamer)
>      => the reception pipeline is really 1:1 
> - in most cases, you will know the possible subcomponents
>    beforehand; and actually, with many codecs it's actually
>    impossible to identify the codec type by looking at the RTP packets
>      => the application has to specify which codecs, and
>         pt<->codec mappings to use in parsing the incoming stream
> 
> I've been studying the current demuxers (and other matching elements in 
> gstreamer), and it seems that this could be implemented in gstreamer 
> either as:
> 
> 1) rtpdemux -> 1:N demux element (like oggdemux for example)
>      - connected to rtpbin/rtprecv's src pad
>      - as auto-detection is not possible, client will have to
>        setup pads beforehand (use PT=x for G711, PT=x for speex,
>        and so on; static PTs are an exception here)
>      - the mux will look at incoming packets, and push the
>        them forward to the matching src pad
>      - the pipeline:
>        udpsrc -> rtprecv -> rtpdemux -> depayloader(s) -> decoder(s) -> sink
>      - possible problems:
>          - in the audio case, I guess "adder" wouldn't be enough to mux the
>            multiple decoder paths together as it requires input from
>            all sources before mixing
>          - a modified version of "adder"? .. and same for video
Yes this approach is the one that comes to mind first. It is simple and
robust, but with big SDP caps lists, it can get pretty big too. I don't
dislike this method and I think it would work fine.

> 
> 2) rtp-switcher -> 1:1 element that can decode and depacketize multiple codecs
>     - as (1), connected to rtpbin/rtprecv's src pad
>     - creates and manages multiple depayloader+decoder elements
>     - each received packet is pushed through exactly one
>       depayloader+decoder combo
>     - sink pad takes application/x-rtp, and pushes out
>       encoded media (e.g. audio/x-raw-int) on its src pad
>     - possible problems:
>          - how to implement in gstreamer? any good example elements
>            that are doing something like this?
This can also be accomplished in many ways, but it is more complicated
but I don't know if it is necessarily more elegant. One way of doing is
to have an element that contains the SDP info needed to map dynamic
payload types to their respective element, as well as the capability to
know what element (depayloader/decoder) it needs for that payload. Then
it can read the payload type on each RTP packet, figure out if it
changed. If it has changed, then it can modify the pipeline accordingly
on the fly. I don't know of any good examples for this, but I don't see
why it wouldn't be possible.

> 
> Now any comments on which approach is best from gstreamer architecture 
> point of view?
In my opinion I would say the muxer approach, but that is probably due
to the fact that I don't have a very clear way in my mind of
implementing the second approach. Which method do you guys think would
work better in gstreamer?

> 
> One problem common to both approaches is the jitter buffer code, now 
> in gst-plugins-base-cvs/gst-libs/gst/rtp/gstbasertpdepayload.c. Ideally 
> different codecs should share the same jitter buffer (so that change of
> codec mid-stream wouldn't result in throwing out the whole jitter buffer), 
> but this is pretty hard to realize with the current elements... or?
Indeed currently the jitter buffer is in the depayloader, meaning each
codec would have a different one. I'm sure workarounds can be
accomplished for this, such as having the depayloader push out all the
buffers before dying. I think we can worry about this after something
is actually done, don't think it's enough to warrant changes in design
for this.