problem with synchronization between multiple audio clients

Fri Aug 31 05:59:48 PDT 2012

On 2012-08-31 11:13, Elis POPESCU wrote:
> Hi,
>
> I'm trying to get multiple audio clients synchronized and I'm using 
> GstNetClientClock (on the audio clients as the pipe clock) connected 
> to a  GstNetTimeProvider (on the server, based on sys clock) using 
> Gstreamer 0.10.36.  I'm using RTP/RTCP protocol over UDP multicast 
> (from the server to the clients) and I'm also setting "ntp-sync" and 
> "use-pipeline-clock" to true on gstrtpbin on server and clients. 
> However if the network is loaded then the clients go out of sync.
>
> Do I miss something? Any ideas? Can it be a bug?
>
> Are there fixes or big changes in the way RTP/RTCP synchronization 
> works in the new version of Gstreamer (1.0) that is coming soon?
>
> Thanks !
>
>
> _______________________________________________
> gstreamer-devel mailing list
> gstreamer-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
What you describe is a use case that is unfortunately not covered by the 
docs.
The docs write about inter-stream synchronization, for example for A/V 
sync, and lip sync.
What is does NOT describe is what you are doing: one sender, N 
receivers, the receivers shall play synchronously, that is, the phase 
difference between the receivers shall be below a certain amount of ms.

I have been developing similar, and ran into this issue. Once you keep 
in mind that "stream sync" means "inter-stream sync" in the docs, things 
become much clearer.
Here is what I did to get synchronization to work:

- use the net client clock, just like you do.

- buffer-mode set to none : this is perhaps the most important one. The 
default setting performs clock slaving, which is actually not what you 
want here. You already synchronize the pipeline clock, all you want is
for the audio buffer timestamps on the receiver side to be corrected 
using the RTP timestamp and RTCP SR information. Setting 0 (= none) 
gives you just that. Some additional notes on this follow later.

- jitter buffer size must be at least as large as the maximum expected 
network delay. If you expect your packets to be up to 500 ms late, and 
you use a 200ms jitter buffer, you will run into trouble.

- Do not forget about the audio sink. It is what actually schedules 
audio buffers to be played based on the timestmap. This is one big 
difference between sinks in GStreamer and sinks in many other libraries 
and programs: most sinks are not just simple output devices that emit 
data as soon as you push it into them, they pay attention to buffer 
timestamps. With this in mind, I recommend to use 
GST_BASE_AUDIO_SINK_SLAVE_SKEW as slave-method for the sink (which is 
the default value anyway), and give the drift-tolerance property a close 
look. Its default value is 40 ms. Once the rtpbin corrects buffer 
timestamps based on the RTP timestamps and the RTCP SR packets, and the 
sink then detects a resulting drift between the internal audio clock and 
these timestamps that is greater than the drift-tolerance value, it will 
skew the playout pointer, compensating for the drift. The smaller the 
drift-tolerance value, the more likely the skewing. If for example you 
try to use 0.5 ms for the drift-tolerance, skewing will happen often, 
and playback will sound very bad. Pick a sensible middle ground between 
synchronized playback and stable playback.

Some more notes about all of this:

- buffer-mode: initially, I used mode 1 (clock slaving). And in the 
beginning, synchronization was good. But then I noticed that after 
network disturbances (sudden delays, packet losses etc.), the rtpbin did 
not compensate (even though the phase difference between the receivers 
was over 100 ms). Once the synchronization got lost, it never recovered. 
I now believe that the clock-slaving mode is not supposed to do that in 
the first place. It concerns itself primarily with replicating the 
sender clock to allow for smooth playback. Transmission delays are only 
a concern when it affects the smoothness of the playback. In other 
words, as long as playback is smooth, everything else is unimportant in 
this mode.

- The audio sink slave method "skew" is the recommended one. Others are 
to ignore drifts and to use a resampler. The resample method produces 
REALLY bad results and is not recommended, for good reasons. However, 
skewing itself is not optimal, because it is equivalent to throwing away 
samples or filling in null samples. I have plotted the output of the 
receivers on an oscilloscope, and this sudden skewing is clearly 
visible. Such a behavior is however very bad, because hard cuts like 
that introduce very high frequency artifacts. What other systems do to 
solve this is to either mute audio for a few ms, then unmute, masking 
the cut, or more advanced (and complicated) techniques with pattern 
matching, trying to recognize common areas and periods etc. I think the 
mute idea could be added to the sink easily, and would help greatly.

- A signal in the rtpbin that informs me that the rtpbin is now in sync, 
since it received the RTCP SR information, would be a good addition. It 
would allow me to mute the sound in the beginning, until it gets 
synchronized (since until the rtpbin receives the SR packet, playback 
happens on a best-effort basis, without any synchronization).

Now, I could be wrong about many of these things, but unfortunately, 
docs are very sparse there, and I obtained this information by digging 
through the rtp bin/jitterbuffer/session code and the base sink code.
What would be great is for one of the core developers to comment on my 
points and refer to whats right and what is inaccurate or wrong. Please? 
Pretty please? :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20120831/397a997b/attachment-0001.html>