[gst-devel] addition of a buffer flag

Wed May 12 11:16:03 CEST 2004

Hi,

On Wed, 12 May 2004, Thomas Vander Stichele wrote:
> > That is an ideal situation that doesn't work for any of the modern
> > formats (unfortunately). Matroska fails horribly here, as does Ogg. Nut,
> > interestingly, is fairly codec agnostic, as are AVI and MPEG.
> Ogg doesn't fail here at all.  Where does it fail ? Our implementation
> of it fails.

Separating actual data and headers.

Now that we're at it, note that you seem to take Ogg as a starting point,
whereas I take AVI as a starting point. In AVI, there is *no way* that
headers will end up in the stream, because it screws up seeking. In Ogg,
it's all fine because of the finer model.

Both are a way to solve the problem. None is invalid w.r.t the other.
It's about preference. So let's continue with that as a starting point.

Ogg fails to nicely prepare data for AVI. AVI fails to nicely prepare data
for Ogg. So either way, one of those two will need to special case the
daat type in the muxer/demuxer. MPEG sort of takes the Ogg approach, but
is a special case (works for both) because the used codecs have no initial
header data. Rather, it is embedded in the stream in front of each single
GOP (video) or frame (audio). ASF, Matroska, Quicktime, WAVE et al take
the AVI approach.

Good, so the question now is: should codec intialization data be part of
the header or part of the data stream?

> > should put codec initialization data (or "headerdata") in the caps, which
> > is what we currently do for all of the before-mentioned formats.
> I didn't realize that, and it sounds like a big hack to do it that way
> :)

Maybe; it's the best we had so far. I don't consider it a hack. I consider
it a very nice solution because it abstracts our data protocol in a way
that is exactly similar to how most common media formats of nowadays
handle it too.

Again, depends on your perspective. If you like AVI, you love our current
approach. If you like Ogg, you think our current approach is a big hack.

> Which is why, now that we've run into this problem, we discussed it
> here, and we seem to think having a flag on the buffers for that is a
> lot nicer.  Especially given that the header data still is data that
> needs to end up in the stream, so there's no reason to special-case it
> for elements that don't need it special-cased.

Is it data? You think it is, I don't think it is. Given that it is
required before a decoding operation can take place, it is no stream data
but initialization data and thus part of the header. That belongs in caps.

That's *my* perspective.

> Really, putting binary blobs of data in the caps is just evil and
> abusing caps.

Why?

> Also consider the fact that an ogg muxer needs to write
> two separate pages of data with header info for *each* stream in the ogg
> container.  So you would also have to invent a way of making clear where
> you separate the header pages in your codec_data caps property.

codec_data is simply how WMA, SVQ3, MP4V and MP4A call it. You can call it
vorbis_comment_data, vorbis_init_data and vorbis_page_data (because, from
what I remember from the Matroska specs, it's actually 3, not 2 - not
sure though, doesn't really matter).

> Seriously, this is not what caps are supposed to be used for.

Again, I think it is.

> > And it prevents
> > problems in seeking in a stream through protocol. Imagine the situation
> > where I seek to halfway an Ogg file and only then start reading it.
> you do the seek on the ogg demuxer, which will handle it properly since
> the ogg demuxer knows it should get the header first.  It Just Works.

So you do format-agnostic operations in the demuxer. Hurray, that's what
we want to prevent!

> > Rather, I'd like to propose to put all this header-data, that is *always*
> > needed to start decoding and is *always* the first packet to come in, to
> > be first by *definition*.
>
> For ogg, it's the two first packets.  they need to be two separate
> buffers since they need to be sent out as two different ogg pages.
>
> A flag to mark a buffer as a header is the cleanest solution.

You claim that, but you don't proove it yet. This isn't Ogg or libogg or
LibOggstreamer. This is GStreamer! We are a cross-format library. Think
beyond Ogg.

> >  An easy way of doing it (and, indeed, rather
> > hacky in a way) is by putting it in the caps rather than in the stream
> > data. We define it that way because we put the data in a GstCaps, and
> > therefore we do not depend on a specific data order in the stream. I see
> > no reason to step off this method specifically for Ogg.
>
> I propose using the flag method for all codecs then.  It's a lot nicer
> to implement, it doesn't add crap to the caps, and it works for elements
> that don't have to know about the difference between
> headers-in-the-stream and raw-data-in-the-stream.

It isn't nicer to implement. Rather, it will be a pain to implement
because all our decoders will need to be special-cased to recognize header
data packets. Currently, that's all part of a GstCaps which means that we
can initialize a codec right during capsnego (and therefore, most codecs
are able to forward capsnego on to next/previous elements *right because
of this* - this makes capsnego a very efficient process).

> > Now the data order thing that I mention here is not a *practical* problem.
> > Rather, it is an *architectural* problem and we currently don't have a
> > good solution to that. Therefore, our current Vorbis data protocol is
> > using the one solution and all other codecs use the other.
>
> What do we do with vorbis currently ?

The same we always did. The Ogg data protocol. There's a bug in bugzilla
that mentions that Vorbis in Matroska doesn't work because of this...

> >  I'd like to
> > simply pick one or the other and be done with it. I don't like a mix of
> > several solutions (sometimes the one, sometimes the other) for one
> > problem.
>
> I agree. But let's move away from the ugly binary-data-in-caps.  Caps
> aren't for binary data.

Don't claim that - give me a good reason!

> > Serialization is simple, look at the output of qtdemux when decoding an
> > Sorensen-3 file or a MPEG-4 file. We already use that. It's simply
> > hexcodes, and the length of the data is strlen(hexstring)/2.
> >
> > (e.g. caps: video/x-theora,width=(int)384,height=(int)288,
> > codec_data=(buffer)fe56ba3c810c).
>
> Ok, so I can read and interpret those caps, *except* for the
> codec_data.  What does it tell me ?

You cannot read a palette either, which is definately supposed to be part
of the caps.

> > Of course,I'm open to suggestions from others if they see different
> > (better) solutions to the problem drawn above.
>
> Tell me how the buffer flag approach fails for you.

It only works for Ogg.

Ronald