what is the gstreamer audio synchronization resolution?
virtually_me at claub.net
virtually_me at claub.net
Sun Jul 28 21:06:05 UTC 2019
Thank you for your thoughts and your reply.
Regarding the necessary resolution, the audio band is 20-20k Hz. At 20kHz on period is 0.05 milliseconds (50 microseconds). If we want to try and get to a resolution of 10 degrees at 20kHz that is 1/36th of 50 microseconds, so approaching 1 microsecond. That level of time resolution would be sufficient for any audio application. Anything better than that would be great, but is just “extra” beyond what is necessary. For my application, where loudspeaker crossovers are not typically done above 10kHz, the requirement is a factor of 2 less severe. In the end, we are still talking about something around 1-2 microseconds. I do not see any possibilities for achieving this level of resolution except for resampling and shifting in time each audio stream that is to be interleaved. Maybe this could be implemented as audiointerleave high_resolution=true, with high_resolution=false as the default.
In the meantime I will try upsampling, then audiointerleave, then downsampling again before sinking the audio stream. This would also give a hint about how CPU intensive a resampling based audiointerleave might be…
From: gstreamer-devel <gstreamer-devel-bounces at lists.freedesktop.org> On Behalf Of Nicolas Dufresne
Sent: Sunday, July 28, 2019 4:35 PM
To: Discussion of the development of and with GStreamer <gstreamer-devel at lists.freedesktop.org>
Subject: Re: what is the gstreamer audio synchronization resolution?
Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <nicolas at ndufresne.ca <mailto:nicolas at ndufresne.ca> > a écrit :
Le dim. 28 juill. 2019 14 h 55, <virtually_me at claub.net <mailto:virtually_me at claub.net> > a écrit :
I have some questions about the time resolution of audiointerleave.
I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.
Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.
A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.
Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.
I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?
The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.
I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.
Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.
gstreamer-devel mailing list
gstreamer-devel at lists.freedesktop.org <mailto:gstreamer-devel at lists.freedesktop.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gstreamer-devel