what is the gstreamer audio synchronization resolution?

Sun Jul 28 20:34:55 UTC 2019

Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <nicolas at ndufresne.ca> a
écrit :

>
>
> Le dim. 28 juill. 2019 14 h 55, <virtually_me at claub.net> a écrit :
>
>> I have some questions about the time resolution of audiointerleave.
>>
>> I have been working with gstreamer pipelines for a couple of years to
>> implement loudspeaker crossovers via LADSPA plugins. This in general
>> entails
>> a number of steps from source to sink, including de-interleaving the
>> incoming audio, teeing into N mono channels that are processed with one or
>> more LADSPA plugins, and (re) interleaving the channels into a N channel
>> "output stream" that is directed to a sink. Since the wall-clock
>> processing
>> time may be longer or shorter on each channel, the element audiointerleave
>> is used to correct for the various latencies of each LADSPA-processed
>> stream
>> automatically.
>>
>> I am concerned that the resolution that audiointerleave can achieve is too
>> low. My assumption is that the code looks for an optimum time-alignment
>> point on a sample-by-sample basis. Is that correct? In that case the
>> resolution would be about one sample in time, e.g. for 48kHz there is one
>> sample every 0.0208 milliseconds.
>>
>> Let me explain how this would negatively impact my particular
>> application. A
>> 3kHz tone one period is 0.33 milliseconds. Considering the phase within
>> each
>> period, there are 360 degrees. If the time resolution is 0.021
>> milliseconds
>> then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33
>> degrees.
>>
>> A resolution of 33 degrees is not sufficient for my needs. This is because
>> delay is often used to align the wavefronts that are launched by each
>> driver
>> in the loudspeaker, and the phase angle between one driver and the next
>> needs to be maintained regardless of any processing latencies to a
>> resolution of several degrees. In my example I chose 3kHz, however, the
>> resolution in terms of phase will get worse and worse as frequency
>> increases. For example at 6kHz the resolution increases to 66 degrees. The
>> resulting phase angle would depend on the exact latency experienced by
>> each
>> stream before interleaving, and modifying the number of LADSPA plugins (or
>> any other pipeline element) could have a very large and negative impact on
>> the phase angle and resulting audio performance from the loudspeaker.
>>
>> Related to this issue, I would like to implement some type of delay for
>> time-alignment as part of the loudspeaker crossover. I can do this using
>> e.g. audioecho or by modifying timestamps, however, one-sample resolution
>> will be insufficient. I need much better resolution.
>>
>> I would like to know what approaches might overcome this problem. If I
>> increase the sample rate by N times I could improve the resolution by N
>> times, however, I need an improvement by about an order of magnitude (10
>> times) and such high samples rates are unachievable. Are there any other
>> techniques that can be used within gstreamer to get a more fine-grained
>> time
>> resolution for synchronization purposes when interleaving streams?
>>
>> The only approach to get better time alignment (that can think of) prior
>> to
>> interleaving the streams would be to resample each mono stream to the
>> pipeline sample rate plus a time offset that has a time resolution of ten
>> microseconds or better. This would work, but would be rather
>> computationally
>> expensive. Is there a better or more efficient way that already exists
>> within gstreamer?
>>
>
> That is an interesting project, indeed audiointerleave only supports
> per-sample alignment. It also have configurable tolerance to clock drift,
> which by default, is likely multiple samples.
>
> I'm not aware of such a thing as sub-sample interleaving in GStreamer.
> This discussion reminded me some aspect of Arun's beamforming blog. Which
> may of may not be of interest here.
>
> Of course adding such precision to audiointerleave would require a very
> close look at how we perform the initial alignment, as any overclip could
> be disastrous to your use case. And the an extra per stream offset will
> need to be maintained. Should this be in nanosecond, and what are the best
> algorithm for this, I don't know, and I'm not an expert, but I'm sure there
> is a slightly more efficient way then going through massive upsampling
> which would on top of adding more CPU, will also increase the memory
> bandwidth.
>

https://arunraghavan.net/2016/06/beamforming-in-pulseaudio/


>
>> _______________________________________________
>> gstreamer-devel mailing list
>> gstreamer-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20190728/0715c0b7/attachment-0001.html>