[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Sun Nov 22 04:21:14 PST 2015

On 22.11.2015 00:27, Tanu Kaskinen wrote:
> On Sat, 2015-11-21 at 19:42 +0100, Georg Chini wrote:
>> On 20.11.2015 16:18, Tanu Kaskinen wrote:
>>> On Fri, 2015-11-20 at 08:03 +0100, Georg Chini wrote:
>>>> On 20.11.2015 01:03, Tanu Kaskinen wrote:
>>>>> On Wed, 2015-02-25 at 19:43 +0100, Georg Chini wrote:
>> The point is, that the assumption that source_output and sink_input
>> rate are the same is not valid. As long as they are, you will not hit a
>> problem.
> I did cover also the case where there is clock drift and the rate
> hasn't yet been stabilized. (It's the paragraph starting with "In the
> cases where...") I argued that the clock drift won't cause big enough
> data shortage to warrant a 75% safety margin relative to the sink
> latency.
>
> I now believe that my initial intuition about what the safety margin
> should be to cover for rate errors was wrong, though. I now think that
> if the sink input is consuming data too fast (i.e. the configured rate
> is too low), the error margin has to be big enough to cover for all
> excess samples consumed before the rate controller slows down the sink
> input data consumption rate to be at or below the rate at which the
> source produces data. For example, if it takes one adjustment cycle to
> change a too-fast-consuming sink input to not-too-fast-consuming, the
> error margin needs to be "rate_error_in_hz * adjust_time" samples. The
> sink and source latencies are irrelevant. If it takes more than one
> adjustment cycle, it's more complicated, but an upper bound for the
> minimum safety margin is "max_expected_rate_error_in_hz *
> number_of_cycles * adjust_time" samples.

This can't be true. To transform it to the time you have to divide
by the sample rate, so your formula (for the one step case) is
basically
safty_time = relative_rate_error * adjust_time
The module keeps the relative rate error for a single step below
0.002, so you end up with 0.002 * adjust_time, which means for
10 s adjust time you would need 20 msec safety margin regardless
of the sink latency. This is far more than the minimum possible
latency so it does not make any sense to me. If you have a
large initial latency error which would require multiple steps your
estimate gets even worse.
The other big problem is that you cannot determine the number
of cycles you will need to correct the initial latency error because
this error is unknown before the first adjustment cycle.

When you calculate that safety margin you also have to consider
that the controller might overshoot, so you temporarily could
get less latency than you requested.

It is however true, that the sink latency in itself is not relevant,
but it controls when chunks of audio are transferred and how
big those chunks are. So the connection is indirect, maybe
max_request is a better indicator than the latency. I'll do some
experiments the next days to find out.

>
>> Once you are in a steady state you only have to care about jitter.
>> I cannot clearly remember how I derived that value, probably
>> experiments, but I still believe that 0.75 is a "good" estimate. If
>> you look at the 4 msec case, the buffer_latency is slightly lower
>> than 3/4 of the sink latency (1.667 versus 1.75 ms) but this is
>> also already slightly unstable.
>> In a more general case the resulting latency will be
>> 1.75 * minimum_sink_latency, which I would consider small enough.
> I don't understand where that "1.75 * minimum_sink_latency" comes from.
> I'd understand if you said "0.75 * maximum_sink_latency", because
> that's what the code seems to do.

The 0.75 * sink_latency is just the part that is stored within the
module (mostly in the memblockq), so you have to add the
sink_latency to it. That's 1.75 * sink_latency then. The source
latency does not seem to play any role, whatever you configure,
the reported value is most of the time near 0.

All calculations assume that when I configure source and sink
latency to 1/3 of the requested latency each, I'll end up with
having about 1/3 of the latency in source and sink together.
I know this is strange but so far I have not seen a case where
this assumption fails.

> Anyway, any formula involving the sink or source latency seems bogus to
> me. adjust_time and the rate error (or an estimate of the maximum rate
> error, if the real error isn't known) are what matter. Plus of course
> some margin for cpu overhead/jitter, which should be constant (I
> think). The jitter margin might contribute more than the margin for
> covering for rate errors.

In the end adjust_time and rate_error don't matter because they are
inversely proportional to each other, so that the product is roughly
constant.

>
>> Regarding your concern that we want to keep latency down: The old
>> module loopback starts to throw errors in the log when you go down
>> to 7 msec latency on Intel HDA. The new module will run happily at
>> 4 msec, so it is still an improvement.
>> With the controller I am currently developing together with Alexander
>> it might even be possible to go further down because it is better at
>> keeping the latency constant.
>>
>>>>> dynamic latency support. If the sink or source doesn't support dynamic
>>>>> latency, buffer_latency is raised to default_fragment_size_msec + 20
>>>>> ms. I don't think it makes sense to use default_fragment_size_msec.
>>>>> That variable is not guaranteed to have any relation to the sink/source
>>>>> behaviour. Something derived from max_request for sinks would probably
>>>>> be appropriate. I'm not sure about sources, maybe the fixed_latency
>>>>> variable could be used.
>>>> Mh, that is also a value derived from experiments and a safety margin to
>>>> avoid underruns. I can say that for USB cards there is a clear connection
>>>> between default_fragment_size_msec and the necessary buffer_latency
>>>> to avoid underruns. Don't know if max_request is somehow connected
>>>> to default_fragment_size.
>>> Yes, max_request is connected to default_fragment_size. For the alsa
>>> sink, max_request is the same as the configured latency, which
>>> default_fragment_size affects.
>> To me it looks like max_request and fixed latency are derived from
>> default_fragment_size * default_fragments. The probability of underruns
>> only depends on default_fragment_size, the number of fragments
>> is irrelevant. The question is how large the chunks are that are transferred
>> between memblockq and source/sink (or between soundcard and pulse?).
>> So neither max_request nor fixed latency seem to be the right indicators.
>> I remember that it took me quite a while to figure out which values are safe
>> for batch and non-batch cards.
>> My tests have been done with USB, Intel HDA and bluetooth.
> It may be that default_fragments has very little effect in practice,
> because we get an interrupt when the first fragment has been consumed.
> Increasing default_fragments only protects against underruns in case of
> big scheduling delays. But the thing is that you need to code against
> the backend-independent interface, not against the alsa backend
> implementation. In theory an alsa sink running without dynamic latency
> can request up to default_fragment_size * default_fragments amount of
> data. max_request tells what the theoretical upper limit for one
> request is.

I can follow your argument here. Maybe when using max_request
we end up with a formula that is valid for both cases, see above.

> Also, the bluetooth sink sets a fixed latency, and that latency has
> nothing to do with default_fragment_size.

Yes, I know. That is the reason for the additional 20 ms I add to
default_fragment_size. As already said this patch contains a lot
of numbers that are derived from experiments (all the other patches
don't). For bluetooth  you have the issue that the fixed latency
shown when the device is not active is far higher than what you
get when the device is running, so it is very difficult to guess a
good value at the startup of the module.

>
> I guess none of this matters, if I'm right in saying that the sink and
> source latencies are irrelevant for calculating the required safety
> margin. Hmm... now it occurred to me that the sink latency has some
> (smallish?) relevance after all, because when the sink input rate
> changes, the sink will have some data buffered that is resampled using
> the old rate. If the old rate was too low, the buffered data will be
> played too fast. Therefore, the rate error safety margin formula that I
> presented earlier has to be modified to include the sink latency. So
> the rate error safety margin should be at least
> "max_expected_rate_error_in_hz * (number_of_cycles * adjust_time +
> sink_latency)" samples.
>
Some additional remarks about the importance of buffer_latency and
those safeguards:

1) As long as you are not at the lower latency limit both are more
or less irrelevant.

2) Even if those safeguards are too small, the module will fix up
buffer_latency during runtime until there are no longer any underruns.

3) From my experiments the values I set are rather conservative,
meaning that the probability of underruns is very low when you are
within the borders (I'm not saying they never happen). Your
recommendations so far lead to even larger values.

4) For those who know what they are doing or like to experiment,
patch 13 implements a way to circumvent all limits (although this
patch should be implemented differently from today's perspective).