what is the gstreamer audio synchronization resolution?
virtually_me at claub.net
virtually_me at claub.net
Mon Jul 29 11:54:17 UTC 2019
I have an idea about how to improve synchronization of streams by the audiointerleave element: introduce hidden “elements” within audiointerleave that are able to add tiny (microseconds) of wall-clock latency to each stream that is to be interleaved. The hidden elements do nothing except consume wall-clock time by waiting. The wait period for each stream is chosen such that the sum of the wall-clock latency of each stream plus its hidden delay element is relocated to a sample boundary. The streams are then aligned as usual on a best/nearest sample basis. A “alignment-precision” property can allow the user to set the desired precision of this sample boundary alignment, or to disable the hidden elements.
I think this could work well. It could be added “under the hood” of audiointerleave (invisible to the user). This behavior could be disabled when needed by the user, or turned on when needed by the user, which ever is the desired default behavior.
From: gstreamer-devel <gstreamer-devel-bounces at lists.freedesktop.org> On Behalf Of Nicolas Dufresne
Sent: Sunday, July 28, 2019 4:35 PM
To: Discussion of the development of and with GStreamer <gstreamer-devel at lists.freedesktop.org>
Subject: Re: what is the gstreamer audio synchronization resolution?
Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <nicolas at ndufresne.ca <mailto:nicolas at ndufresne.ca> > a écrit :
Le dim. 28 juill. 2019 14 h 55, <virtually_me at claub.net <mailto:virtually_me at claub.net> > a écrit :
I have some questions about the time resolution of audiointerleave.
I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.
Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.
A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.
Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.
I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?
The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.
I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.
Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.
gstreamer-devel mailing list
gstreamer-devel at lists.freedesktop.org <mailto:gstreamer-devel at lists.freedesktop.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gstreamer-devel