[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Wed Nov 25 00:00:43 PST 2015

On 25.11.2015 01:14, Tanu Kaskinen wrote:
> On Tue, 2015-11-24 at 08:21 +0100, Georg Chini wrote:
>> On 24.11.2015 03:50, Tanu Kaskinen wrote:
>>> On Sun, 2015-11-22 at 13:21 +0100, Georg Chini wrote:
>>>> On 22.11.2015 00:27, Tanu Kaskinen wrote:
>>>>> On Sat, 2015-11-21 at 19:42 +0100, Georg Chini wrote:
>>>>>> The point is, that the assumption that source_output and sink_input
>>>>>> rate are the same is not valid. As long as they are, you will not hit a
>>>>>> problem.
>>>>> I did cover also the case where there is clock drift and the rate
>>>>> hasn't yet been stabilized. (It's the paragraph starting with "In the
>>>>> cases where...") I argued that the clock drift won't cause big enough
>>>>> data shortage to warrant a 75% safety margin relative to the sink
>>>>> latency.
>>>>>
>>>>> I now believe that my initial intuition about what the safety margin
>>>>> should be to cover for rate errors was wrong, though. I now think that
>>>>> if the sink input is consuming data too fast (i.e. the configured rate
>>>>> is too low), the error margin has to be big enough to cover for all
>>>>> excess samples consumed before the rate controller slows down the sink
>>>>> input data consumption rate to be at or below the rate at which the
>>>>> source produces data. For example, if it takes one adjustment cycle to
>>>>> change a too-fast-consuming sink input to not-too-fast-consuming, the
>>>>> error margin needs to be "rate_error_in_hz * adjust_time" samples. The
>>>>> sink and source latencies are irrelevant. If it takes more than one
>>>>> adjustment cycle, it's more complicated, but an upper bound for the
>>>>> minimum safety margin is "max_expected_rate_error_in_hz *
>>>>> number_of_cycles * adjust_time" samples.
>>>> This can't be true. To transform it to the time you have to divide
>>>> by the sample rate, so your formula (for the one step case) is
>>>> basically
>>>> safty_time = relative_rate_error * adjust_time
>>>> The module keeps the relative rate error for a single step below
>>>> 0.002, so you end up with 0.002 * adjust_time, which means for
>>>> 10 s adjust time you would need 20 msec safety margin regardless
>>>> of the sink latency. This is far more than the minimum possible
>>>> latency so it does not make any sense to me. If you have a
>>>> large initial latency error which would require multiple steps your
>>>> estimate gets even worse.
>>> Well, if the sink input consumes 20 ms more audio during 10 seconds
>>> than what the source produces, then that's how much you need to buffer.
>>> I don't see any way around that. Underrun-proof 4 ms latency is just
>>> impossible with those parameters. Using smaller adjust_time is an easy
>>> way to mitigate this. Maybe it would make sense to use frequent
>>> adjustments in the beginning, and once the controller stabilizes,
>>> increase the adjustment interval?
>> Stable 4 ms operation is possible, that is a fact. To be on the safe side,
>> let's say 5 ms. Using those short latencies with 10 sec adjust time is
>> however asking for trouble, that is true. I kept 5 ms running over night
>> with 2 sec adjust time more than once to test stability when I wrote the
>> patch, so I can say for sure that this works. According to your formula
>> I would need to keep at least 4 ms in the buffer for that, which is not the
>> case.
> You hit the 4 ms limit only if the initial error is sufficient to eat
> the 4 ms margin. It's not obvious that this was the case in your
> experiments. The length of the experiment is not important, because
> it's expected that once the correct rate is found, the memblockq only
> needs to be big enough to cover for scheduling delays.

The initial error has nothing to do with hitting underruns or not
if it is not so large that you hit them immediately. As already said
further down, the rate is adapted to match the error.
Your assumption, that once you are in a steady state, everything
is fine, is wrong, because you still have jitter to account for.

Normally those underruns happen when the error is already quite
small and you are near the desired latency. So the reason for
them is not a problem of the regulation, it is a problem of the
steady state. They happen, if the module cannot handle the
configured latency.

I'll show you an example debug output.
I requested 25 ms latency using my USB device (default_fragment_size = 5,
default_fragments = 4, adjust_time = 1) specifying 20 ms for the sink and
5 ms for the buffer latency. So I already know it will not work. This is 
where I
start:

Loopback overall latency is 19,43 ms + 18,62 ms + 1,45 ms = 39,43 ms

Now follow 6 normal regulation cycles and then you are here:

Loopback overall latency is 20,89 ms + 6,04 ms + 0,65 ms = 27,49 ms
rate difference: 0 Hz

At that point, the rate is already at the base rate and the buffer is near
the 5 ms.
Now there is another cycle, also at the base rate:

Loopback overall latency is 20,48 ms + 7,02 ms + 0,07 ms = 27,46 ms

After this cycle the rate is raised by 3 Hz, which means requesting a
latency change of only 62.5 usec within the next second - and you hit
an underrun.

Those long-time tests basically show, if the buffer is large enough
to cope with the jitter.

>
> By the way, what do you count as an underrun? That the memblockq gets
> empty, or that the sink buffer gets empty? When I speak of underruns, I
> mean the former, but only the latter causes actual glitches in the
> audio, if the sink supports rewinding. If the source pushes out data
> before the sink buffer gets empty, the sink buffer should get rewritten
> with the new data, and a glitch is avoided.

There are two situations in the module that I consider as underrun 
conditions
because they can potentially affect the current latency value.

a) A rewind is requested in sink_input_process_msg_cb()
b) pa_memblockq_peek() fails in sink_input_pop_cb()

>> I think 10 sec adjust time is too long anyway if you want to have a
>> quick convergence to the desired latency and a good stability in the
>> steady state.
>> I tested shorter adjust times down to 100 msec and I cannot see
>> much improvement compared to 1 or 2 seconds. I would recommend
>> to lower the default to 2 sec.
> That sounds like a good recommendation.
>
>>>> The other big problem is that you cannot determine the number
>>>> of cycles you will need to correct the initial latency error because
>>>> this error is unknown before the first adjustment cycle.
>>>>    
>>>> When you calculate that safety margin you also have to consider
>>>> that the controller might overshoot, so you temporarily could
>>>> get less latency than you requested.
>>> Can the overshoot be greater than the initial error? Getting less
>>> latency than requested is exactly the problem that the safety margin is
>>> supposed to solve. If the overshoot is less than the initial error, the
>>> safety margin calculation doesn't need to take the overshoot into
>>> account.
>> No, it cannot. It is just a little bit (I explained it in my e-mail to
>> Alexander
>> who said the controller could not overshoot). But the initial error is
>> not really relevant here at all if it is not so large to produce underruns
>> immediately, because the rate will be adapted to match that error.
>> I think it is more the error of the adjustment itself which - for reasons
>> still completely unclear to me - can be quite high during the first cycles
>> when the deviation from the base rate is high.
> The initial error is relevant. This disagreement might be just a matter
> of difference between what we mean by "error". See the discussion later
> in this mail.
>
> I'll try to explain how the initial error is important: if the loopback
> starts in a state where the data production rate is lower than the data
> consumption rate, a deficit will accumulate as long as this rate
> mismatch continues to exist. The larger the error is, the faster the

This is the simple case which I mean when I say that the initial error
should not be so high to cause underruns immediately.
The module always starts in a state, where both rates are equal
apart from clock skew. The initial latency is always set to some value
near the requested latency, so the initial error should not be too large.
Be aware, that when the actual latency is lower then the requested
value, the rate will be lowered after one third of a second, so that the
situation that data production rate is lower than the consumption rate
should be remedied then. (This is one of the reasons why I do not wait
a whole adjust_time before doing the first correction)

> deficit will accumulate, and the longer the error continues to exist,
> the larger the deficit will grow. As soon as the two rates match, the
> deficit stops accumulating. If the controller overshoots, the buffer

When the rates match, there is no deficit anymore, that's what
the regulation looks after.

> will start to accumulate extra data, and that will protect against
> future deficits, in case the controller overshoots again in the other
> direction. So, assuming that the overshoots are always smaller than the
> initial error, the rate mismatch can cause underruns only until the
> first zero-crossing, and that only if the initial error is in the
> direction where we run a deficit. If the initial error is to the other

See debug output above for a case where the controller does
not overshoot and the underrun happens after the first zero-crossing.

> direction, the memblockq will start accumulating extra data in the
> beginning, which is of course entirely safe.
>>>> It is however true, that the sink latency in itself is not relevant,
>>>> but it controls when chunks of audio are transferred and how
>>>> big those chunks are. So the connection is indirect, maybe
>>>> max_request is a better indicator than the latency. I'll do some
>>>> experiments the next days to find out.
>>> I don't think max_request is any better. In practice it's almost the
>>> same thing as the configured sink latency.
>> But you were arguing that it is a good indicator for batch cards, or did
>> I read you wrong? I would assume that the arguments you used there
>> apply here as well.
> Sorry for being confusing. I did originally think that max_request was
> a good indicator, but that was before I realized that buffer_latency
> should not be proportional to the sink or source latency.

I am still convinced that it is the right indicator and buffer_latency
should be proportional to the sink latency. I really did lots of experiments
and there is a very clear connection between the two. It may not be
obvious, but it is there. With the values I am using you practically never
see any underruns at all.
Once again, let me emphasize that buffer_latency is only proportional
to the sink latency in the border case where you are asking for a latency
that is too low.

>
>>>>>> Once you are in a steady state you only have to care about jitter.
>>>>>> I cannot clearly remember how I derived that value, probably
>>>>>> experiments, but I still believe that 0.75 is a "good" estimate. If
>>>>>> you look at the 4 msec case, the buffer_latency is slightly lower
>>>>>> than 3/4 of the sink latency (1.667 versus 1.75 ms) but this is
>>>>>> also already slightly unstable.
>>>>>> In a more general case the resulting latency will be
>>>>>> 1.75 * minimum_sink_latency, which I would consider small enough.
>>>>> I don't understand where that "1.75 * minimum_sink_latency" comes from.
>>>>> I'd understand if you said "0.75 * maximum_sink_latency", because
>>>>> that's what the code seems to do.
>>>> The 0.75 * sink_latency is just the part that is stored within the
>>>> module (mostly in the memblockq), so you have to add the
>>>> sink_latency to it. That's 1.75 * sink_latency then. The source
>>>> latency does not seem to play any role, whatever you configure,
>>>> the reported value is most of the time near 0.
>>> Are you saying that the configured source latency doesn't actually
>>> affect the real source latency (at least as reported)? That sounds like
>>> a bug. (One possible explanation would be that since the latency
>>> queries aren't done with random intervals, the queries might by chance
>>> be "synchronized" with certain source buffer fill level.)
>> Yes, the source almost always reports a latency near zero while the
>> sink reports the configured latency. That has already been the case
>> with the old module. I do not think that there is some synchronization,
>> pacmd list-sources shows the same. Occasionally you will see higher
>> values on the source side but it is rather rare.
>> I hope it is no bug, otherwise I would have to redo the logic of the
>> module.
> It's probably not a bug. Things happen in the alsa IO thread loops in
> this order:
>
> 1: fill buffer (sink) / empty buffer (source)
> 2: process events, such as "get latency" messages
> 3: sleep
> 4: goto 1
>
> So when a "get latency" message is sent, alsa sinks refill the buffer
> before processing the message, and sources push out any currently
> buffered audio. There are checks, though, that prevent this from
> happening if the sink buffer is already more than half full, or if the
> source buffer is less than half full.
>
>>>> All calculations assume that when I configure source and sink
>>>> latency to 1/3 of the requested latency each, I'll end up with
>>>> having about 1/3 of the latency in source and sink together.
>>>> I know this is strange but so far I have not seen a case where
>>>> this assumption fails.
>>> It doesn't sound strange to me, because if you randomly sample the
>>> buffer fill level of a sink or a source, on average it will be 50% full
>>> (assuming that refills happen when the buffer gets empty, which is
>>> approximately true when using timer-based scheduling). On average,
>>> then, the sum of the sink and source latencies will be half of the sum
>>> of the configured latencies.
>> Should then the reported latency not be half of the configured?
>> This is not the case, at least on the sink side.
> See above. In about 50% of cases the measured sink latency will be
> about 100% of the configured latency, and in the other 50% cases the
> average will be 75%. For sources, the measured latency will be 0% in
> half of the cases and on average 25% for the other half of the cases.
> At lower latencies the "do nothing" check will trigger less often due
> to a constant-sized safety margin, so the sink latencies will be even
> more skewed towards 100% and source latencies towards 0%.

OK, understood. Strange that you are talking of 75% and 25%
average buffer fills. Doesn't that give a hint towards the connection
between sink latency and buffer_latency?
I believe I found something in the sink or alsa code back in February
which at least supported my choice of the 0.75, but I have to admit
that I can't find it anymore.

>
>>>>> Anyway, any formula involving the sink or source latency seems bogus to
>>>>> me. adjust_time and the rate error (or an estimate of the maximum rate
>>>>> error, if the real error isn't known) are what matter. Plus of course
>>>>> some margin for cpu overhead/jitter, which should be constant (I
>>>>> think). The jitter margin might contribute more than the margin for
>>>>> covering for rate errors.
>>>> In the end adjust_time and rate_error don't matter because they are
>>>> inversely proportional to each other, so that the product is roughly
>>>> constant.
>>> Can you elaborate? I don't see why they would be inversely proportional
>>> to each other.
>> Mh, I am no longer sure if I understood you correctly. The rate error
>> I am talking about is the deviation from the base rate that is set by
>> the controller. This rate deviation is inversely proportional to the
>> adjust time, because if you have twice the time to correct the same
>> latency difference, you will only need half of the rate deviation.
>> If you are talking about the precision of the sample rate, this should
>> not be relevant because the error should be below 0.1 percent.
>> (Alexander said that was the worst he could find when he measured it.)
> The rate error I mean is neither of what you mention above. By rate
> error I mean the difference between the current configured sample rate
> and the final sample rate that will eventually be reached. Or actually

That is exactly what I mean when I say "deviation from the base
rate set by the controller", and that is inversely proportional to
adjust_time as explained above. There is nothing else controlling
the rate and the initial state is that both rates match.

> I prefer to think about the "data consumption rate" and the "data
> production rate", which are pretty much the same thing, but fit my
> brain better. When talking about sample rates, there's the problem that
> if the sink input consumes data too slowly, its sample rate has to be
> *lowered* to make the sink input "go faster", and that's against my
> intuition.
>