audible glitches with resampled/skewed audio sink

Sat Feb 17 18:36:16 UTC 2018

System Description: 

I am streaming audio from one computer (the source of audio) to another (the
playback computer) over a LAN using RTP/UDP streaming. Both of these
computers have their clocks synchronized with NTP to a local stratum 1 (GPS
based) time server that I built and that resides on the LAN. NTP jitter is
30-50 microseconds. As a result of the tight synchrony between machines I
can set my rtpjitterbuffer mode to  mode 4 (synced  assume synchronized
sender and receiver clocks). On the receiving system, I set the audio sink
of the pipeline to provide-clock=false to force the pipeline to use the
system clock. On the sending computer, I also set provide-clock=false on
the source element. In this way, both pipelines use GstSystemClock, getting
their clock from the local NTP disciplined clock. My pipelines are built
using gst-launch-1.0 and both systems are running gstreamer 1.10.4.

Problem Description: 

On the clients, when the sinks are set to provide-clock=false they will be
synced to the pipeline clock using a method that the user can specify with
the slave-method property. I have experimented with values of resample,
skew, and none. I am getting occasional audible glitches with resample
and skew. With skew, there is a small tick or pop sound that occurs
relatively frequently, perhaps every 10 seconds. When audio is muted
(sending zeros) the noise is gone, so I assume this is the playback pointer
being changed when samples are non-zero in a way that causes these audible
artifacts. With the slave-method set to resample, the audio is fine for
longer but then there will be several seconds of somewhat garbled and
glitchy audio, before normal playback resumes. With slave-method set to
none, I have the least amount of audible noises (if any). I have done
listening tests for hours to try and characterize this behavior, so I am
pretty confident in these assertions.

To get an idea of what else is controlling the resampling and skewing, I
looked at GstAudioBaseSink. This has several related properties, e.g.:

alignment-threshold - Timestamp alignment threshold in nanoseconds.
Default value: 40000000

discont-wait - A window of time in nanoseconds to wait before creating a
discontinuity as a result of breaching the drift-tolerance. Default value:
1000000000

drift-tolerance - Controls the amount of time in microseconds that clocks
are allowed to drift before resynchronisation happens. Default value: 40000

The definitions of these parameters are not exactly clear to me, e.g.
alignment threshold and drift tolerance. I would like to learn more about
these in detail, but I do not find any more documentation about them except
what I have copied and pasted above from the online docs. More about this
below.

My intent for the system I am building is to have multiple endpoints on the
LAN, each with their own sink. The sinks are actually part of the SAME
loudspeaker system, just e.g. one sink in the left and one in the right
speaker. In this case I need playback from all sinks to be well
synchronized. This is audio, and left-right timing differences of even 1
millisecond create audible effects, so I would like to keep the
synchronization threshold at about 1/10th of that, or 100 microseconds. This
is much more severe a restriction than, for example, multiroom playback. In
that case as long as the synchronization is below about 20-40 milliseconds
the system will seem to be in sync. But that is not the case with my
setup.

To try to achieve my synchronization goals, I set:

drift-tolerance=100

and left the other parameters alone. This results in the glitchy audio I
described above with slave-method set to resample or skew. When I relax the
drift-tolerance parameter to 500 or 1000, the glitches still occur, just
less frequently. I am concerned that setting the drift tolerance to eg. 1000
will not result in sufficient synchronization of multiple sinks in my
system.

QUESTIONS:

What does the property alignment-threshold actually do? It is not clear
from the documentation.

I assume the glitches are happening when resampling or skewing is taking
place, and otherwise there is no resampling/skewing taking place. Is that
correct? Can I improve this behavior by changing some other property of
GstAudioBaseSink?

It seems that resample and skew are needed to account for differences
between the pipeline clock and the playback rate of the sink. It seems
plausible that estimating the difference in these rates over time and then
using a resampling method to account for the LONG-TERM rate differences
would be a superior approach. I assume that what I am experiencing is the
effect of corrections that are too drastic that happen only now and then,
resulting in the glitchy audio I am hearing. Is there any way to implement
some kind of long-term averaged resampling, either under gst-launch or via
code (e.g. if my application was coded in C++) based on sink buffer
depletion rate? 

I recently viewed a presentation from the 2015 Gstreamer conference in
Dublin by Sebastian Dröge (Synchronised multi-room media playback and
distributed live media processing and mixing). Sebastian mentioned some new
NTP-based pipeline clock slaving methods based around netclock that are or
will be being programmed into gstreamer as can be used by elements like RTP
pay/depay. Since I am using gst-launch these are probably unavailable to me,
and anyway may not be available yet under version 1.10. I would very much
like to learn more about these, especially if they can be applied to my
problem. I would also appreciate any and all feedback on how to achieve my
goal using gstreamer (if it is possible) to have inter-client synchrony of
100 microseconds or better. Is that possible?

-Charlie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20180217/04ddf935/attachment.html>