[gst-devel] Ogg and GStreamer

Wed Jul 30 06:24:08 CEST 2003

Hi guys,

I have an issue with ogg (no, not vorbis, but ogg) and how to demux/mux ogg 
streams. The reason to do this was me writing a comment plugin for vorbis 
(which would probably work for theora and speex, too with minor changes) and 
not wanting to have all the ogg stuff in that plugin.

For all that don't know how ogg works exactly, 
http://www.xiph.org/ogg/vorbis/doc/framing.html is the bitstream description 
and I don't know much more about it either.
Let me try to summarize what ogg does (in quotation marks the official name in 
the ogg spec): Ogg takes "packets", which are data chunks of any length, 
seperates them into "segments", which are data chunks of at least 255 bytes. 
Then it packs segments into "pages", which are a maximum of 64k including some 
seek and CRC stuff. These pages are then concatenated and that makes up an ogg 
stream.

So, what is the problem? The problem is the type of additional information that 
is put into the page headers. Ogg requires an "absolute granule position", 
which corresponds to a format that GStreamer calls "frame". It is media 
specific and up to the encoded packets to define. And ogg saves for each packet 
the number of frames _including_ the packet. Ogg uses this instead of 
timestamps. Now I am trying to put/get this stuff into/from GstBuffers.
The obvious idea is to require that all streams that get muxed into ogg need to 
be framed so that you have 1 packet per buffer. There is one problem though: 
When you get a GstBuffer, you have no idea what the frame offset is _including_ 
the frame. buffer->offset is the frame offset _excluding_ the buffer.
So there are two options now:
- include a frames field in GstBuffer, so that the frame offset can be 
computed. (That's what I vote for)
- wait for the next buffer, and use the offset field of that buffer as 
the "absolute granule position" in the ogg stream.

This brings up another question:
If you have for a format (be it time, be it frames, be it whatever) for some 
buffer only the end offset of the buffer, you can neither specify length nor 
offset, because we use (start offset, length) tupels and compute the end. If we 
were to use (start offset, end offset) tupels, we could store the end offset 
but not the length. What way do you think would be better to use?

Some other things wrt ogg:
1) I decided to do typefinding with oggdemux because the ogg spec reequires 
that the first buffer identifies the stream inside the ogg unambigiously. So 
it's easy to do and allows adding more formats easily later. As an interim 
solution, oggdemux will do the typefinding itself, later on - when the 
autoplugger is able tio do this - it'll just use NULL caps and let the app or 
the autoplugger to their job.
2) Ogg allows streams to be multiplexed "grouped" or "chained" (see 
http://www.xiph.org/ogg/vorbis/doc/oggstream.html at the bottom). Oggdemux will 
handle grouped streams by using multiple pads and chained streams by removing 
all pads and giving out new ones. This will not work with spider either.

Oggmux will apparently work just the other way around. It'll take any number of 
input streams (not caring about caps) and pack the data into an ogg file. 
NEW_MEDIA events will make it use chained streams.

Benjamin