Misbehaving timestamps

Thu Dec 10 04:39:14 PST 2015

I'm sorry this is another long post. There are some questions at the bottom
that may be answerable without reading the whole post so just answering
them would help me a bit.

Pipeline:

udpsrc -> queue -> rtpMP2TDepay -> TSDemux -> split into video and audio

video -> queue -> decodebin (H264Parse -> avdec_h264 -> deinterlace -> ->
x264Enc -> rtph264pay -> rtpbin -> multiudpsink

audio -> queue -> decodebin (AacParse -> avdec_aac_latm) -> audioconvert ->
audioresample -> VoAacEnc -> rtpMP4GPay -> rtpbin -> multiudpsink

The problem is that after many hours the audio rtp packets are ahead of the
video rtp packets. (i.e. when a client connects the video and audio are
immediately out of sink).

I added pad probes between the following elements:

VPad1: tsdemux video src pad -> video queue
VPad2: video queue -> decodebin
VPad3: rtpbin -> video multiudpsink

APad1: tsdemux audio src pad -> audio queue
APad2: audio queue -> decodebin
APad3: rtpbin -> audio multiudpsink

Each pad callback records the timestamp of the current buffer (relative to
the segment start time) and every 30 seconds I dump the last recorded
timestamp of each pad to a file.

The input is a TS stream so the TSDemux is being fed with synchronised
audio and video.

When video and audio are in sink the timestamps look like this (all times
in ms and the latency at the two sinks is 900ms):
Clock time 90321
VPad1: 91157     APad1: 90333
VPad2: 89677          APad2: 89437
VPad3: 89437     APad3: 89416

I don't know why VPad1 and APad1 are ~800ms apart as I would have thought
the timestamps coming out of TSDemux would be the same, however it works so
we'll ignore that.

The timestamp of the video buffer coming out of the queue (VPad2) is later
than the timestamp of the audio buffer coming out of the audio queue
(APad2) which is expected as the video transcoder has more latency than the
audio transcoder.

The time between VPad2 and VPad3 is around 240ms and between APad2 and
APad3 is around 21ms suggesting it takes 240ms to transcode a video buffer
and 21ms to transcode an audio buffer (to be expected as video takes more
work and audio).

Adding the latency to VPad3 and APad3 gives the current clock time (near
enough) and the buffers are due for dispatch to the network.

Several hours later the audio and video are out of sync and the times are
as follows:
Clock time 81390143
VPad1: 81390803     APad1: 81390154
VPad2: 81389483         APad2: 81390026
VPad3: 81389243     APad3: 81389237

Now things have switched around and the buffer coming from the audio queue
(APad2) is much later than the buffer coming out of the video queue
(VPad2). This is weird as we know the latency of the video transcoder is
about 10x larger than the audio transcoder.

Now between pads APad2 and APad3 there appears to be a latency of 789ms
compared to 21ms before suggesting it is now taking 789ms to transcode an
audio buffer.

However, I don't believe this is the case. My theory is that the audio
transcode is still taking just 21ms to transcode the packet but for some
reason is modifying the timestamp i.e. it takes a buffer has a timestamp
789ms in the future, transcodes it in 21ms and then deducts the 768ms from
buffer causing the sink to send the packet immediately. The audio is now
768ms *ahead* of the video at the client which is about what I see at VLC.

The input TS stream is an on-air broadcast and so is subject to errors. I
have GST_DEBUG=2 and do see some errors in both the video and audio. The
audio errors are in gstavauddec so I guess it fails to decode some packets
and then one of the downstream elements fails to detect a missing audio
packet and rather than checking the buffer timestamp it starts sucking
future packets from the audio queue.

***News just in***

I've left the whole pipeline running since I noticed it was out of sync
(about 3 hours ago) and while writing this that video and audio
synchronisation has restored itself. Looking in the statistics I noticed
that 24 minutes ago the audio queue (between tsdemux and the transcoder)
emptied completely (0 buffers in the queue). After that the buffer
timestamps are back to the expected values (e.g. like my first example
above).

This may suggest a way to solve it:
When I detect the timestamps are out of sync, could I put a blocking pad
probe on the audio then send an EOS through the transcoder elements, then
unblock the pad again? But when would I unblock the pad again? And would
this put the downstream elements in the wrong state?

Does anyone have a better way to solve it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20151210/ba950af5/attachment.html>