Questions about timestamps and audio/video sync in gst-rtsp-server

Fri Sep 4 12:21:08 UTC 2020

Hi all,

During the testing of our RTSP implementation which is using gst-rtsp-server we noticed that not all RTSP clients will have audio/video synchronized, and that making some changes will fix the sync in one client while breaking the sync in another one.

The pipeline used in gst-rtsp-server looks like this, although this can be modified based on the requested configuration:
appsrc is-live=true do-timestamp=true min-latency=0 ! h264parse ! rtph264pay name=pay0 pt=96
appsrc is-live=true do-timestamp=true min-latency=0  ! audioconvert ! audioresample ! mulawenc ! rtppcmupay name=pay1 pt=0

Video appsrc is getting buffers from another pipeline where we are dynamically adding/removing branch for RTSP as needed, simplified pipeline which is providing video looks like this:
v4l2src do-timestamp=true ! tee ! queue ! videorate drop-only=true ! imxvideoconvert_g2d  ! vpuenc_h264 ! appsink

Audio appsrc is also getting buffers from another pipeline where we are dynamically adding/removing branch for RTSP as needed:
appsrc is-live=true min-latency=0  ! tee ! queue  ! appsink

Appsrc in this pipeline is getting buffers directly from our application, the reason for this audio only pipeline is that we want to avoid accessing audio from RTSP multiple times, the only purpose of this pipeline is to duplicate buffers via tee element for as many times as RTSP needs it (we have more than one media factory/mount point). In this case we are manually timestamping audio buffers like in the example from https://gstreamer.freedesktop.org/documentation/tutorials/basic/short-cutting-the-pipeline.html?gi-language=c , i.e.
GST_BUFFER_DTS (buffer) = GST_BUFFER_PTS (buffer) = gst_util_uint64_scale (audioGenerator->num_samples, GST_SECOND, audioGenerator->samplerate);

gst-rtsp-pipeline is grabbing buffers from the two pipelines more or less the same as in the example https://github.com/GStreamer/gst-rtsp-server/blob/master/examples/test-appsrc2.c but without manually setting the timestamps since we are letting the appsrc do that by setting the do-timestamp=true as described in https://gstreamer.freedesktop.org/documentation/application-development/advanced/pipeline-manipulation.html?gi-language=c#inserting-data-with-appsrc .
Audio and video branches are dynamically added on the "media-configure" callback according to the configuration linked to mounting point.
All media factories which we use have the "shared" property set to true.

  1.  This implementation works perfectly fine in VLC for audio/video sync, however in one VMS audio was constantly late for around 1 second.

One thing worth mentioning is that this specific VMS is pushing its own configuration via ONVIF which will result in having 2 different ONVIF media profiles (see https://www.onvif.org/specs/srv/media/ONVIF-Media-Service-Spec.pdf?26d877&26d877 for details if you are interested), where one of those will be used for video, and the second one for audio.
For the gst-rtsp-server this means that there will be 2 separate RTSP requests with 2 different URLs and that for the first one the pipeline will be
appsrc is-live=true do-timestamp=true min-latency=0 ! h264parse ! rtph264pay name=pay0 pt=96

and for the scond one the pipeline will be
appsrc is-live=true do-timestamp=true min-latency=0  ! audioconvert ! audioresample ! mulawenc ! rtppcmupay name=pay1 pt=0
So the RTP streams are negotiated and started independently of one another over separate RTSP sessions.
After some experimentation audio/video sync was fixed by setting the "max-size-time" property of the queue element in the audio pipeline (not the gst-rtsp-server pipeline but the pipeline providing audio buffers) to 100ms.
Setting the "leaky" property of the queue element seems to have no effect at all in this situation.
My understanding is that for some reason the gst-rtsp-server  pipeline was not grabbing the buffers and sending them fast enough, at least when compared to the video pipeline?
Since the RTP streams, and gst-rtsp-server pipelines, are independent from one another I don't really see what can be done to synchronize them on the gst-rtsp-server level?

  1.  VMS number 2 is initially in sync but after 5-10 minutes audio starts to be late, this is the same with or without changes described under 1.
After some more experimentation this was solved by setting the "max-size-time" property of the queue element in the audio pipeline to 20ms, and by setting the "max-buffers" to 20 and "drop" to true for the appsink element in the audio pipeline.
VMS number 1 is still working fine, but now in VLC the video is starting to be late.
Why and how can the queue and appsink elements from the originating pipeline affect audio/video synchronization in gst-rtsp-server pipeline?
Does this has anything to do with latency of all involved pipelines? I also tried to query max-latency of audio/video pipelines and set those values as "max-latency" on appropriate appsrc elements but that did not seem to have any effect.

  2.  VMS number 3 is always having audio from between 5 to 10 seconds late, absolutely nothing I've tried had any impact on this.
In this case VMS is establishing 2 RTSP sessions but is using the same URL.
Since the media factory is share, could that have any effect on the audio/video sync or timestamping of the buffers?

  3.  Regarding "do-timestamp" of the appsrc, this does the same as if we would manually set the timestamps like this in the "need-data" callback?

// PTS/DTS = absolute (current) time - base (start) time

GstClockTime pts, dts;

GstClock *appsrcClock =  gst_element_get_clock (appsrc);

pts = dts = gst_clock_get_time (appsrcClock) - gst_element_get_base_time(appsrc);
GST_BUFFER_PTS (buffer) = pts;
GST_BUFFER_DTS (buffer) = dts;
In the case of appsrc for the originating audio pipeline, using the "do-timestamp" or trying to set the timestamps as in the code above did not work (audio could not be heard), are there some specific situations where the "do-timestamp" should not be used?

  1.  My understanding is that the buffers always need to be re-timestamped in the gst-rtsp-server pipeline, because for example the pipeline from which we are grabbing the buffers might be already running for some time and those buffers already have timestamps which reflect that. Can the values or the way timestamps are set (for example if they are set by "do-timestamp" or manually) of the original timestamps in any way affect the "new" timestamps generated in gst-rtsp-server pipeline?

  2.  Beside the buffer timestamps, is there anything else that we need to take care of so that audio and video are in sync?
What is confusing to me is that the audio/video sync is working depending on the client, if it was not working at all I would know that there's definitely a problem in our implementation.
I've found a similar problem discussed in the mail list but without concrete solution http://gstreamer-devel.966125.n4.nabble.com/Issues-trying-to-synchronise-video-appsrc-with-audio-td4667004.html

Regards,
Mihael.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20200904/6d6118b3/attachment-0001.htm>