Accuracy and operation of seek commands (using tsdemux) and segment information

Mon Jul 4 08:50:38 UTC 2016

Dear colleagues,

We are working on a project in which we need to adjust the playout  
timing of related media streams, including MPEG2-TS streams. The basic  
idea is to firstly perform a seek on a pipeline running on a companion  
device, based on the current media playout position of another  
pipeline on another (master) machine.

We know that media synchronization across devices can be achieved by  
using NetTimeProvider/NetClientClock, NTP and PTP functionalities in  
GStreamer. Indeed, we have tested these functionalities with quite  
satisfactory results. However, our goal/requirement is to synchronize  
the media playout at the stream level, based on reception/presentation  
times of media packets/frames.

For that purpose, we have slightly modified the tsdemux element to be  
able to retrieve the PTS field and a global NTP-based timestamp  
(inserted at the server-side) from the MPEG2-TS, as well as the slice  
type (the goal is to retrieve this information only for key frames).

This information is exchanged between the master and slave(s) via  
sockets. Once joining the session, the slaves receive the timing from  
the master and will initially perform a seek to an extrapolated target  
media playout position. We know from previous posts and from Sebastian  
Droge’s talk at the GStreamer conference that seeks do not perform  
very well for tsdemux elements. At this moment, we are testing with  
stored content, using filesrc element, and the seek performs quite OK  
(with some tricks), in both Ubuntu and Android.

We have checked that during playout, the buffers (video frames) are  
regularly pushed (one-by-one) from the tsdemux. However, just after  
issuing play and seek commands, the tsdemux pushes several consecutive  
buffers. We know that the decoder needs several frames for starting  
the decoding process, and that the buffer PTS will be later converted  
into a buffer running_time. We also know that a segment event is sent  
after a seek command to inform about the valid range of timestamps. We  
have captured the segment event with gst_event_parse_segment() and  
retrieved its fields, but we do not completely understand how the  
segment is generated and what values for different it should include.

So, we would appreciate it very much if you could help us in  
clarifying us the following issues:

1. Do the segments indicate the range of buffers that are  
consecutively pushed by the tsdemux after a seek command? Or,  
contrarily, do they inform about the piece of content and the rate  
between the seek position to the end of the file?

In our tests, the start field of the segment does not match with the  
media playout position we included in the seek command, even when  
forcing a seek to a position of the media file that corresponds to a  
key frame.

2. In our case, when performing a seek, it seems that two / three or  
four GoPs (containing only I and P frames, we have tested with  
different GoP lengths) are immediately pushed to the decoders. It  
seems reasonable that the decoder needs to start decoding from a key  
frame, but we do not understand why more frames are pushed. Is there  
any mechanism to know / control the number of buffers that are almost  
simultaneously pushed to the decoder after a seek command?

3. We would like to know the delay between demuxing and presentation  
times of video frames. We know that buffer PTS are converted into  
buffer running_time, using the info of segments, and then buffer  
running_times will be played out once reaching the pipeline  
running_time. How can we track the buffers from the tsdemux output  
until they get assigned a running_time?

The final goal of our project is to develop an HbbTV 2.0 compatible  
GStreamer-based tested, which we want to release in the near future.

Thank you very much in advance!

Cheers,

Mario Montagud