[Bug 608148] [tsdemux] Better handle PCR<=>PTS conversion (big difference, latency, ...)

Tue Jul 16 23:22:10 PDT 2013

https://bugzilla.gnome.org/show_bug.cgi?id=608148
  GStreamer | gst-plugins-bad | git

--- Comment #7 from Edward Hervey <bilboed at gmail.com> 2013-07-17 06:22:01 UTC ---
Some more explanations regarding this proposed change in regards to the latency
part for live pipelines.

NOTE: While this explanation is for streams that are *NOT* broken like the ones
above (very big difference between PCR and PTS/DTS received at the same time),
the same technique can be trivially used to support those.

NOTE: For the sake of simplicity, we assume the rate is perfect (remote clock
goes at exactly the same speed as the local clock) and no clock estimation is
needed. We also assume PCR/PTS/DTS wrapover is taken care of.

NOTE: We use DTS for the various calculation. If no DTS is present in PES
header, it is the same as the PTS. If a DTS is present and different from PTS,
that reordering/latency will be handled and reported by decoders (if decoding
is needed).

NOTE: We use the term running time for both input and output. Buffer timestamps
can/might be different from those, but proper use of segment.base will take
care of that.

In mpeg-ts, the timing system is based around a full end-to-end "System Target
Decoder" (STD), which is a whole system including capture, demuxing, buffering,
decoding and presentation.
The reason why in mpeg-ts streams the PTS of the various audio and video
streams can be so much higher than the co-located PCR is because it takes into
account the maximum latency/buffering needed in
demuxing/buffering/decoding/reordering in order for the target buffer to be
properly displayed.

In GStreamer, the latency an element reports is the min/max difference in
running-time between input and output of that element (and only that element)
for the specific stream it is receiving.

Due to that difference, tsdemux needs to report not the end-to-end latency (PTS
- PCR received at the same time) but only the min/max latency it introduces
itself. This boils down to the interleaving/packetization latency. The other
downstream elements (parsers, queue, decoders, ...) will add min/max latency
specific to those streams and end up with the ideal capture-to-display latency.

Furthermore, if we do not need to do decoding/display, but instead want to use
a different live stream (such as capture ! demuxing ! rtp payloading !
transmission), we do not need to add unneeded latency. Once again, with this
breakdown of latency per element we can reach that.

An example input is (somewhat simplified) as follows:

(VBUF(A).(B) => Video buffer number A, sub part B (if B==0 it's the start of a
PES and it has a PTS and maybe a DTS (if different from PTS))
(ABUF(A).(B) => Same deal except for audio)

Buffer IN0 (Runningtime INRT0)
Contents
  XXX PCR0 XXX VBUF0.2 ABUF5.0 VBUF0.3 ABUF5.1 XXX ...

...

Buffer IN3 (Runningtime INRT3)
Contents
  VBUF0.42 ABUF5.8 PCR2 XXX VBUF1.0 ABUF6.0 VBUF1.0 ABUF6.1 XXX ...

Notes:
  In IN0:
   * We use PCR0 for rate estimation (against INRT0)
   * we see the start of an audio PES (ABUF5.0).
      * It has a certain DTS (higher than PCR0, say PCR0 + 500)

  In IN3:
   * We use PCR2 for rate estimation (against INRT3)
      * PCR2 is higher than PCR0 (say PCR2 = PCR0 + 50)
   * We see the start of a video PES (VBUF1.0)
      * It has a certain DTS (higher than PCR2, say PCR2 + 300)
   * We get the beginning of a new ABUF, we can output the previous one
(ABUF5).

What happens with the above example ?
  On running time 0, we received the beginning of ABUF5, but we can't output it
yet (=> introducing latency).
  On running time 3, we received the beginning of VBUF1, but we can't output it
yet (=> introducing latency)
  On running time 3, we can output the previously accumulated ABUF5
    ==> What latency do we report and what running-time do we use for that
outputted buffer ?

Several options are available:
  1) Correlate exactly input runningtime to PCR, figure out the PCR/runningtime
offset and set the output runnning time as DTS + offset.
    This can't be used for several reasons:
    * The DTS - colocated PCR includes the end-to-end latency, we would end up
with a big delay
    * The DTS - colocated PCR can be huge on non-standard streams (initial
topic of this bug) resulting in huuuuuuuuuuuuge delays in playback
    * The latency reported would have to be negative to have accurate playback
(we can't do that).
    ==>> NOT A SOLUTION

  2) Ignore PCR/DTS and use the running-time at which the beginning of a buffer
was received.
    * This doesn't work since the various stream packets can be delayed (the
beginning of an audio buffer is not received at the same time as the beginning
of the video bfufer that should be displayed at the same time).
    ==>> NOT A SOLUTION

  3) An intermediate solution, where we use the lowest DTS seen at any given
time as a base DTS/PTS offset against PCR located where the beginning of that
PES was seen (proposal explained in previous comment).
    * This keeps the relationship between DTS of various streams (ABUF5 is to
be presented 150 time units after VBUF1)
    * By figuring out the smallest/maximum delay between the moment the
beginning of a packet was received and the moment it was outputted from a
demuxer, we effectively calculate the latency introduced by the demuxer.

  With the example values we calculate:

  The minimum offset between DTS and PCR:

    DTS diff for ABUF5 : DTS(ABUF5) - PCR0 = PCR0 + 500 - PCR0 = 500
    DTS diff for VBUF1 : DTS(VBUF1) - PCR2 = PCR2 + 300 - PCR2 = 300
    DTS_PCR_offset = MIN(DTS diff) = 300

  (Note: in the initial bug report, this is the 4h value)

  Running times:
    ABUF5 : DTS(ABUF5) - DTS_PCR_offset - PCR(ABUF5 was received)
          : PCR0 + 500 - 300            - PCR0
          : 150

    VBUF1 : DTS(VBUF1) - DTS_PCR_offset - PCR(VBUF1 was received)
          : PCR2 + 300 - 300            - PCR2
          : 0

  Latency introduced:
    Delay between moment beginnning of PES was received and it was outputted
    ABUF5 : Received at PCR0, outputted at PCR2:
          : PCR2 - PCR0
          : PCR0 + 50 - PCR0
          : 50
    VBUF1 : Not outputted yet, but if it was outputted at PCRX = PCR0 + 200 it
would be
          : PCRX - PCR2
          : PCR0 - PCR0 + 200
          : 200
    We store those various min/max values so that we can answer the latency
query if/when it arrives. We should have enough observations by the time it
arrives for it to be coherent and stable.

    (For those paying really attention, you will notice we have effectively
given enough room (with the maximum latency) to downstream so that the video
buffer has enough time to be decoded in time).

Limitations:
  The example provided is somewhat idealized, there might be some corner cases
where we need to output the buffer of one stream without having received the
beginning of a PES on all streams and therefore can't figure out the MIN(DTS
diff) accross all streams.
  To mitigate this we could:
    * Set some default minimum DTS diff depending on the nature of the stream
(video, audio, subtitle....).
    * Or have a global default MIN(DTS diff)
    * Or delay pushing the buffers until we have seen at least one PES start on
each stream (maybe not on the subtitle streams though). Those delayed buffers
might end up arriving too late for display in the sinks, but at least would
ensure downstream decoders/elements have all the data they need.

  I prefer the last solution fwiw. It could also fit well with the "Fast start
dvb" proposal in bug #703884 .

-- 
Configure bugmail: https://bugzilla.gnome.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.