[Bug 735666] design doc : trick mode handling in demuxers (SKIP)

Wed Jan 7 06:21:37 PST 2015

https://bugzilla.gnome.org/show_bug.cgi?id=735666
  GStreamer | gstreamer (core) | unspecified

--- Comment #11 from Tim-Philipp Müller <t.i.m at zen.co.uk> 2015-01-07 14:21:31 UTC ---
And couple of random comments about the proposal doc:

 - I think this should just be an additional section(s) in the existing
part-trickmodes.txt. One section per trick mode as identified above, or
somesuch. Can label it/them as draft if needed.

> This concentrates mostly on container formats that have
> full visibility of the stream (i.e. contain an index).

 - N.B. an index does not in practice entail "full visibility"
   of the stream, it often does not even index all keyframes

> The difference is that this proposal tries to:
> * Reduce processing complexity downstream (data outputted is always
>   sent with a forward rate of 1.0).
> * Avoid any changes in downstream elements
> * Reduce amount of changes required in demuxers and other elements.

While I see how this can be useful, it is not clear to me that this should be
our target (I realise you are just listing differences here of course).
Requiring additional code in various elements for new features seems fine to
me. We should make sure our system allows people to implement the best possible
trick mode support, not aim for the easiest fix to get something working with
the existing code base with the smallest amount of changes. This just as a
general comment, not mean to be an argument against the proposal, which just
details implementation details of a trick mode type we'll want to support in
any case imho.

I don't know if the proposal will actually be able to minimise the amount of
changes required in demuxers, it's going to be non-trivial to implement this in
practice whilst keeping the restrictions you want. The demuxer will need to be
able to adapt to real-time performance of the pipeline, esp. in terms of being
able to read+process input data fast enough (there will be extra latencies and
overhead caused by incomplete/missing indices etc.).

The rationale for always outputting data with a forward rate of 1.0 here is so
that the short playback sections get played at 1.0 speed, right? I think for
the other non play-skip trick modes I listed in comment #10 we would want to
pass through the original segment with the requested rate though.

> The global idea is to:
> 1) Output at most the same average amount of data downstream to be
>   decoded in realtime (There are still <fps> frames decoded for each
>   walltime seconds, regardless of the rate). 
> 2) Fetch from upstream the same average bitrate. This ensures that if
>   the file was readable at normal rate from a local or remote storage,
>   it can still be read at high rate.

These seem prescriptive in a certain way now, and as I mentioned above I think
a case can be made for a variant of this skip-play mode without such
restrictions. It makes sense in some cases of course, and certain specs have
strict compliciance requirements along those lines, but making this more
restrictive mode the default might arguably be a conflict with the hitherto
defined meaning of SKIP (in part-trickmodes.txt) and one might have to skip
much more
than one would when reading the file sequentially if the goal is not to exceed
the original input bitrate ever).

> Option 0 : Achievable currently without any modification

This seems like a slightly more complicated variant of what is used in practice
by many vendors (emulated trick mode by doing timed KEY_UNIT seeks while
pipeline stays in PAUSED state - this has the advantage that it is
self-regulating in the sense that it will work as fast as the source and
demuxer/parser/decoders can work in practice
on a given file/stream).

> [Option 0] Cons:
>   * Logic needs to be (re)implemented in every application

We could provide helper API for this in some way or another, be it in
GstPipeline, playbin or something separate like a "pipeline manager" object
(which would be useful for other things too), or just a bunch of utility
functions like _{start,stop,change}_simulated_trick_play().

>   * Delay caused by back/forth between element and application for
>     every segment, could potentially cause delays.
>   * Application doesn't know optimal location of keyframes, so can't
>     push out only the requested amount of data (chunks played out will
>     be of variable length).

We could make the position of keyframes available to the application of course,
but the problem is that often the demuxer doesn't know itself for formats where
there's no index (and one needs to support those, otherwise the whole thing is
not very interesting). In practice one would often have to make a blind seek to
some position and see what falls out.

>   * There's a potential issue where the "next" position the
>     application requests is not beyond the next keyframe, resulting in
>     the demuxer pushing a very big segment again. This would require
>     the application to detect and handle such cases gracefully.

 * application needs to do 'cycle detection', it's possible a seek to N+pos
   yields N again as nearest keyframe location. SNAP seek flags would help
   with this, but are not implemented everywhere yet.

The delay thing is a problem in any case btw, also for plain old reverse
playback at 1.0 rate (when there's no wait for the application to issue another
seek/instruction), because we don't necessarily have enough buffering for a
whole GOP to be cached downstream of the demuxer, so the demuxer is often
blocked for some time before being able to process more data.

> Proposal : Move logic in demuxers
>
> In order to avoid the various "Cons" from option 0, the proposed way
> forward is to move the logic of figuring out which segments to be
> played back into the demuxers themselves and activate that mode if the
> application sends a seek with the SKIP flag enabled.

Ok, so the proposal is to interpret the SKIP flag to activate this particular
approach to trick modes (which involves playing back short chunks). I think we
should default to an easier lowest-common-denominator fallback like I-frame
only by default, or let the demuxer/source decie, but have a way to select this
skip-play kind of mode. (Also, reportedly this kind of trick mode is not
generally perceived as very pleasant, which might be another reason to not
default to it).

> 2) Demuxer figures out the initial optimal initial previous and next
>    keyframe for Pos.

N.B. The demuxer might not know where the (actual) next keyframe is. It is
entirely possible and often the case that an index does not index every
keyframe. The logic described will still work fine of course.

> [Possible variants and improvements]
> 2) Global speedup calculation
> If the keyframe interval is non-constant...

Out of curiosity, do we have empirical data about whether this is a rare case
or the normal case?

> 3) Disable audio-track

IMHO this should be the default.

Cool stuff btw, have you implemented it anywhere yet? :)

-- 
Configure bugmail: https://bugzilla.gnome.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.