[pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Sun Feb 8 10:34:19 PST 2015

01.02.2015 03:43, Georg Chini wrote:
> This is the final version of my patch for module-loopback. It is on top of the
> patch I sent about an hour ago and contains a lot more changes than the previous
> versions:
>
> - Honor specified latency if possible, if not adjust to the lowest possible value
> - Smooth switching from fixed latency to dynamic latency source or sink and vice versa
> - good rate and latency stability, no rate oscillation
> - adjusts latency as good as your setup allows
> - fast regulation of latency offsets, adjusts 100 ms offset within 22 seconds (adjust
>    time=1) to 60 seconds (adjust_time=10)
> - usable latency range 4 - 30000 ms
> - Avoid rewinds and "cannot peek into queue" messages during startup and switching
> - works with rates between 200 and 190000 Hz
> - maximum latency offset after source/sink switch or at startup around is 200 ms
>
> I also introduced a new parameter, buffer_latency_msec which can be used together
> with latency_msec. If buffer_latency_msec is specified, the resulting latency
> will be latency_msec + buffer_latency_msec. Latency_msec then refers only to
> the source/sink latency while buffer_latency_msec specifies the buffer part.
> This can be used to save a lot of CPU at low latencies, running 10 ms latency
> with latency_msec=6 buffer_latency_msec=4 gives 8% CPU on my system compared to
> 12% when I only specify latency_msec=10.
> Additionally you can go beyond the safe-guard limits that are built in, you can
> access the range 1 - 3 ms or lower the buffer latency for fixed latency devices.
> Some of my USB devices run fine at a buffer latency of fragment size + 4 ms
> instead of the dfault fragment size + 20 ms.
>
> I tested it all with Intel HDA, USB and bluetooth sound devices. I would like to
> see some test results from other people.

After attempting to split up this patch and to add comments, I got some 
remarks and questions.

> +    pa_log_debug("Loopback overall latency is %0.2f ms + %0.2f ms + %0.2f ms = %0.2f ms, latency difference: %0.2f ms, rate difference: %i Hz",
>                   (double) u->latency_snapshot.sink_latency / PA_USEC_PER_MSEC,
> -                (double) buffer_latency / PA_USEC_PER_MSEC,
> +                (double) current_buffer_latency / PA_USEC_PER_MSEC,
>                   (double) u->latency_snapshot.source_latency / PA_USEC_PER_MSEC,
> -                ((double) u->latency_snapshot.sink_latency + buffer_latency + u->latency_snapshot.source_latency) / PA_USEC_PER_MSEC);

I am not sure whether this split of latency accounting makes sense 
anymore, because it is not possible to attribute these latencies to any 
particular point in time. Especially current_buffer_latency, which (for 
me) is just a meaningless-by-itself intermediate quantity.

Also, here is my line of thought (an alternative derivation of 
current_buffer_latency, which does not, however, yield exactly the 
same), in some pseudocode.

At the moment source_timestamp, the source had already given us 
receive_counter bytes of data, and had source_output_buffer bytes of 
data buffered at the source output level and source_latency microseconds 
of data still sitting in the soundcard buffer. So, at that moment, we 
have been recording for this amount of time, according to the source clock:

recording_duration_at_source_timestamp = source_latency + 
bytes_to_usec(receive_counter + source_output_buffer, base_rate)

If we knew that base_rate is accurate (i.e. that the source clock and 
wall clock are exactly the same), we could add the timestamp difference 
to see for how long we have been recording at sink_timestamp:

recording_duration_at_sink_timestamp = 
recording_duration_at_source_timestamp + sink_timestamp - source_timestamp

We don't know that, because base_rate is in fact not accurate according 
to the wall clock. But we don't have an estimate of the actual source 
sample rate (according to the wall clock), and thus cannot translate the 
timestamp difference from the wallclock domain to the source clock 
domain any better. So we have to live with the above formula, and accept 
that it gives us the absolute error of this order:

recording_duration_error = (sink_timestamp - source_timestamp) * abs(1 - 
real_base_rate / base_rate)

i.e. less than 0.75% of error if we accept that the real sample rate 
never deviates from the nominal one by more than 0.75%.

Using the similar arguments, let's calculate how long the sink input has 
been playing at sink_timestamp. The sink input, according to the source 
clock, has received send_counter bytes of data, but has 
sink_input_buffer bytes buffered in the sink input, and sink_latency 
microseconds of data (according to the sink clock) buffered in the sink. So:

playback_duration = bytes_to_usec(send_counter, ???) - 
bytes_to_usec(sink_input_buffer, !!!) - sink_latency

...with an obvious source of error that we didn't convert the sink 
latency to the source clock domain. But this error is of the same order 
as the recording duration error (because both sink latency and the 
worst-case duration between the message being sent and processed in the 
pop callback are of the same order) that we already accepted, so it's 
pointless to correct.

Let's see what we should put instead of the "???". Obviously, the actual 
rate with which the sink consumed samples. But we have previously 
controlled the rate at which it consumes samples, with the aim of 
keeping the latency constant. So "???" is just base_rate.

Now let's think which rate should be put instead of the "!!!". 
Intuitively, it would appear that it is old_rate, because that's the 
rate associated with the sink input. But there is a counterargument: 
that rate is being constantly manipulated with, in order to cause the 
sink input to consume samples faster or slower than it would normally 
do, and thus does not represent the true sample rate of the sink input. 
Also, due to these manipulations, old_rate might contain jitter, and 
thus base_rate is a better quantity to put instead of the "!!!", with 
the same "we have already accepted a similar error" argument.

The total latency is, obviously,

latency = recording_duration - playback_duration

which, after expansion, is exactly your formula for current_latency, 
with some instances of old_rate replaced with base_rate. As I said, I 
think this replacement may be beneficial for reducing self-inflicted 
jitter while working outside of the deadband.

A wrong-and-hackish (not sure about thread safety) patch is attached 
that does this replacement in as many places as possible (including the 
message processing) in hope to reduce jitter, and also removes 
corrected_latency because it is no longer needed. For me, in webcam->HDA 
and bluetooth->HDA scenarios, it works just as well as your original 
patch - but you have USB playback devices, so your results may be 
different. Could you please apply it on top of my older patch (that 
moves capturing the timestamps) and test? A log similar to what you have 
already sent, but with this patch and with both 0.75% and the 2‰ 
restraints commented out would be useful.

> +      u->latency_error = (4 * u->latency_error + (double)abs((int32_t)(current_latency - u->next_latency)) / final_latency) / 5;

OK, so latency_error is a dimensionless quantity representing the 
relative (to final_latency) error. But then I can't make sense of this:

> +    /* Adjust as good as physics allows (with some safety margin) */
> +    if (abs(latency_difference) <= 2.5 * u->latency_error * final_latency + u->adjust_time / 2 / base_rate + 100)
> +       new_rate = base_rate;

abs(latency_difference) is obviously in microseconds.

2.5 * u->latency_error * final_latency is also in microseconds, good.

100 microseconds as a fudge factor are understandable, too.

But u->adjust_time / 2 / base_rate is something strange, not 
microseconds. Obviously, you meant something different. Besides, this, 
if evaluated, would also yield at most 100 (with adjust_time of 10 
seconds), and thus would be of the same order as the fudge factor. So - 
the whole deadband, according to your own testing, works fine almost 
without this term, maybe it is a good idea to delete it?

-- 
Alexander E. Patrakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Use-a-less-jittery-sample-rate-in-calculations.patch
Type: text/x-patch
Size: 7683 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20150208/91d3600e/attachment.bin>