[Bug 334082] matroskademux: support for multi-segment Matroska files

GStreamer (GNOME Bugzilla) bugzilla at gnome.org
Fri Feb 9 11:57:19 UTC 2018


https://bugzilla.gnome.org/show_bug.cgi?id=334082

--- Comment #15 from Alicia Boya GarcĂ­a <aboya at igalia.com> ---
(The explanations about Matroska can be a bit confusing, so I'm adding a
diagram.)

It seems that the discussion of this issue has geared towards MSE use cases
that actually don't have a big resemblance to the original issue.

The original issue was about support of multiple Matroska segments in a single
file. This is something that Matroska used to allow and some players (e.g. VLC,
mpv) still support.

A Matroska Segment is a the top level object in Matroska containing both
metadata (tracks definition) and video data. Usually a .mkv file contains
exactly one segment, but this is not the case for multi-segment files.
"Matroska File Format" (2009) claims:

> There can be several SEGMENTs in one MATROSKA file, but this is not encouraged to be done, as not many tools are able to handle multisegment MATROSKA files correctly.

Nowadays the Matroska spec claims:

> A Matroska file is composed of 1 Segment.

Multisegment Matroska files are created by concatenating (e.g. with `cat`) two
or more Matroska files. Although technically you can concat any two Matroska
files, the use cases where in currently available players this can produce a
useful result other than just playing the first segment when that file is
opened are two:

* Matroska files using ordered chapters with external references pointing to
another segment. (these are usually used to save space by extracting the
opening and credits of a series to separate files and linking them in each
episode file instead of copying them again and again). Support for this is far
from universal, but not unheard of in desktop players (VLC, mpv, MPC).

* Linked segments. Segments can form a linked list (see PrevUID and NextUID). A
movie (or SegmentFamily in Matroska parlance) is intended to be played as the
concatenation of the timelines of each of its segments, e.g. if the first
segment is 1 hour long, 0:00:00 from the second segment becomes 1:00:00 in the
timeline of the movie shown in the UI. Support for this feature is pretty rare,
as ordered chapters with external references already allow to obtain the same
user-visible behavior and more. Nevertheless, MPC and VLC support it, though
VLC crashes very easily with this feature (in my testing once in every ~four
seeks).

Such segments can be created with readily available free multiplatform software
like MKVToolNix GUI.

Both of these features require a mechanism for finding a Segment given a UID.
Players usually scan the directory where the file is stored and parse every
.mkv file to construct a table of Segment UID -> (file, offset) [offset is only
needed if multi-segment files are supported]. This process is relatively fast
if the number of files in the folder is reasonable as usually only the few
first few bytes per file need to be read.

`file` needs not to be a different file. Indeed, that seems to be the case OP
originally had. We will refer to such files as Multisegment Matroska Movie
Files (a single file containing a single movie containing several Matroska
segments).

>From a practical perspective, the usefulness of Multisegment Matroska Movie
Files is quite limited: if you wanted to save space by using ordered chapters
with external references, it makes sense that those references are stored
separately; if you wanted to split a very long Matroska movie in several
segments (e.g. to accomodate file system maximum file size restrictions) it
makes sense that they are in separate files. Multisegment Matroska Movie files
are a little more than a simple way to "undo" that separation and put them
together in the same file without remuxing.

It's possible also to concatenate completely unrelated, unlinked Matroska
segments in the same file. Currently available players will ignore all but the
first segment, using the rest just as support for finding references.

Neither ordered chapters nor linked segments are supported by matroskademux
currently.

In fact, they cannot be implemented with typical source elements such as
filesrc. For a source element to work with a hypothetical matroskademux
implementation supporting any of these features, such source would need to
support a request for locating and loading a given Segment UID, and in practice
this would mean that it would also need a scan mechanism like described above.

A new source element would be required, let's call it `matroskafinder`, that
similarly to filesrc and others supported a location property, but also
accepted "Segment UID" requests, scanned the folder containing the original
file and fed the correct file to matroskademux. Ideally, this would be a
configurable bin: a factory function for instantiating the inner source
elements could be provided, so that you could play ordered chapters over the
network and the scan mechanism would be replaceable e.g. to query a database
instead of downloading each file in certain applications.

Another implementation challenge is that both of these features require
tracking new timeline mappings. As far as I know, this would be all the
timeline levels, from raw to user facing (none of the names of the timelines
are official):

1. Block Timecode (coded timecode in the frames, relative to the starting time
of its container Cluster).
2. *Cluster time (Block Timecode + parent Cluster start Timecode).
3. CodecDelay adjusted track time (Cluster time - track's CodecDelay, if any).
4. Offset track time (CodecDelay adjusted track time + TrackOffset,
deprecated).
5. *Segment time (offset track time after applying an edition entry, i.e. an
ordered set of chapters, see the definition here:
https://www.matroska.org/technical/specs/index.html). In the case of external
ordered chapters, edits are recursive, as their time codes are in segment time
of the linked segment.
6. *Movie time (Segment time + end time of the previous linked Segment).

The timelines marked with a star are parenting, i.e. it is formed by assembling
together several instances of an immediately lower level timeline class.

Making things even harder, segments are not guaranteed to have the same kind
and number of tracks, codecs, defaults, etc. (though they often do), so it may
be necessary to reset most of matroskademux, including its pads, when switching
to a different segment (including when reading an external ordered chapter).
Constructing the movie timeline map may be tricky.

How could Multisegment Matroska Movie File support like OP wanted be
implemented?

As a prerequisite, support for the other features explained here -- at least
ordered chapters with external references, would make a lot of sense. Ordered
chapters with external references to other files are much more common than
concatenated Matroska files of any kind.

With that done, the only addition would be ensuring that the Segment scanner
actually tries to find and several Segments per file (by reading past the last
element specified in the Meta Seek, looking for a Segment element) and that
file sources can be requested to the factory in matroskafinder with a given
offset and size (the ones covering the found segment).

The scanner would also search in the same file that is being currently played,
finding the segments. This is the way this works in mpv. It's actually the same
as if the segment was in another file.

Of course this all only is possible when matroskademux is the driving force of
the pipeline, able to switch between different file sources.

That all is a daunting task so it should be no surprise that support for movies
with multiple Matroska segments is so spotty.


Next topic: What about "MSE segments"?

MSE has the concepts of "initialization segment" and "media segment", whose
definitions are specific to MSE and depend on the container format. In the case
of WebM a "MSE media segment" is a Cluster element and a "MSE initialization
segment" is the portion of a WebM file that defines the header before clusters
can appear.

MSE segments may appear in the stream in almost any order, with some
limitations: the first ever MSE segment must be an initialization segment.

Further MSE initialization segments in the stream are limited to have the same
number and type of tracks, allowing very little variation. An noticeable
exception is codec data (e.g. PPS and SPS for MP4/h264). Usually applications
insert a second initialization segment after a quality change.

Support of multiple MSE WebM segments in matroskademux revolves about the
following question:

  When working in push mode, what should the demuxer do if it finds a new
Matroska header?

Note the difference with the Multisegment Matroska Movie playback problem
exposed above. In the above case, the demuxer searched everywhere (including in
the same file, past the current Matroska Segment boundary) for an specific
Matroska segment with a given SegmentUID. Here, it's the other way around. The
demuxer was happy parsing the current Matroska Segment when suddenly it ended
and a new one took its place. What should happen in this case?

Well, that's unrelated to the original problem in this bug so... To be
continued in https://bugzilla.gnome.org/show_bug.cgi?id=793333

Let's keep this bug restricted to the original issue, even if it's not met with
the same interest.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the gstreamer-bugs mailing list