what is the gstreamer audio synchronization resolution?

Sun Jul 28 18:32:03 UTC 2019

I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically. 

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees. 

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker. 

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams? 

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?