rtpbin: ts adjustment needs filtering/smoothing

Sat Oct 17 14:49:40 UTC 2020

Le vendredi 16 octobre 2020 à 20:55 -0400, charleslaub at sbcglobal.net a
écrit :
> I have been developing and measuring the performance of a streaming
> audio pipeline. This streams audio using rtpbin (RTP+RTCP) over a
> WiFi connection, and uses NTP to synchronize clocks on all machines
> to better than 100 microseconds. There is one sender and two clients.
> The clients are connected to DACs and amplifiers and are built into a
> pair of loudspeakers, one into the left speaker and one into the
> right. The goal was to get synchronized streaming of PCM audio, with
> synchronization always better than 1 millisecond between left and
> right speakers. This helps to keep the stereo image centered.
>  
> I have been able to achieve and better the 1msec synchrony goal, but
> careful listening reveals “ticks” or “pops” in the audio that happen
> relatively frequently and only when there is audio (never in the
> absence of music). I assume what is happening is that zeros are being
> inserted into the audio, and when they happen to coincide with a peak
> in the music signal the result is audible.
>  
> In rtpbin, I have set the following parameters:
> Latency=60 (milliseconds)
> max-ts-offset-adjustment=2 (nanoseconds per frame???)
> ntp-sync=true
> ntp-time-source=ntp
> rtcp-sync-interval=60000 (microseconds, or 1 minute)
> all other ntpbin parameters are left at the default value.
>  
> The pipeline ends at an alsasinnk. I set the properties:
> drift-tolerance=500 (microseconds)
> provide-clock=false
>  
> I have been making measurements of the synchrony of the left and
> right speakers using ARTA and a microphone placed in front of each
> speaker. Measurements are done over long periods of time (e.g. hours)
> at regular intervals (a few minutes each). This data looks very good,
> and shows the level of synchrony I have been able to achieve.
>  
> Unfortunately the measurements did not reveal what listening test
> did: pops and ticks in the audio. My guess is that this is due to the
> way that the pipeline audio data is being modified in response to ntp
> time data.
>  
> Initially I had set the property max-ts-offset-adjustment to values
> in the thousands or hundreds (of nanoseconds). Measurements that were
> performed at intervals of only a few seconds would reveal the
> playback timing jumping around quite erratically and randomly as much
> as 10 milliseconds. This was surprising, because at the sampling rate
> I am using (48kHz) there are only about 50 frames per second. Until I
> set the max-ts-offset-adjustment parameter to 1 or 2 nsec could I
> prevent the playback timing from jumping around. I cannot set this
> parameter any lower.
>  
> Visual inspection of the timing measurements showed the timing
> randomly advancing and retarding every few seconds. Over time the “by
> eye” average seemed to be about the expected average value. Perhaps
> there is too much jitter in the data and corrections are being wildly
> overdone? Even with the minimum amount of ts-adjustment per frame,
> the audio has regular pops.
>  
> I am wondering if some kind of low-pass filtering could be applied to
> the timing info. The effect will be a smoothing of the data, so that
> the necessary correction changes smoothly over time instead of wildly
> back and forth. A parameter that conveys the maximum allowed or
> expected ts-rate-of-change could be introduced. From there it is
> straightforward to implement low pass filtering with a corner
> frequency corresponding to this time period using e.g. an FIR filter
> or moving average with a timing data sample rate equal to the rtcp-
> sync-interval. Higher frequency changes will be suppressed, but their
> values will still contribute to the overall ts correction if they
> contain a longer period drift. This would be much better than the
> current max-ts-offset-adjustment type limiting.
>  
> Is it possible to implement filtering of timing info along these
> lines?

Perhaps you didn't notice the default slave-method, which is skew. Skew
will remove or fill with silence the audio stream to fix the
synchronization between the audio card clock and the NTP clock. As it
does that naively, you will ear ticks and pop. You'll also notice
the resample method, which in theory should work, but in practice the
resampler there is buggy and the adjustment too aggressive. Some work
is needed there to make it work.

Well known companies using GStreamer have introduced the "custom"
slave-method. Along with
gst_audio_base_sink_set_custom_slaving_callback(), you get notified per
units of the drift. These vendor controlls the HW, so instead of
resampling, they will adjust the audio clock PLL in order to lock the
audio to the system/ntp/ptp/etc. clock in use.
>  
> I would be happy to provide my pipelines if that would be helpful,
> provide measurement data, or any other info that might lead to a
> solution.
>  
> -Charlie
>  
> _______________________________________________
> gstreamer-devel mailing list
> gstreamer-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel