[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Sat Nov 21 10:42:49 PST 2015

On 20.11.2015 16:18, Tanu Kaskinen wrote:
> On Fri, 2015-11-20 at 08:03 +0100, Georg Chini wrote:
>> On 20.11.2015 01:03, Tanu Kaskinen wrote:
>>> On Wed, 2015-02-25 at 19:43 +0100, Georg Chini wrote:
>>>> as well as the way of reacting to sink input or source output moving.
>>>>
>>>> The goal is to make sure that the initial latency of the system matches
>>>> the configured one.
>>>>
>>>> While at it, allow to set adjust_time to 0, which means "no adjustments
>>>> at all".
>>>> ---
>>>>    src/modules/module-loopback.c | 329 +++++++++++++++++++++++++++++++++++-------
>>>>    1 file changed, 275 insertions(+), 54 deletions(-)
>>>>
>>>> diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
>>>> index 79bd106..09f2f58 100644
>>>> --- a/src/modules/module-loopback.c
>>>> +++ b/src/modules/module-loopback.c
>>>> @@ -59,7 +59,9 @@ PA_MODULE_USAGE(
>>>>    
>>>>    #define DEFAULT_LATENCY_MSEC 200
>>>>    
>>>> -#define MEMBLOCKQ_MAXLENGTH (1024*1024*16)
>>>> +#define MEMBLOCKQ_MAXLENGTH (1024*1024*32)
>>> Why is this change done? Whatever the reason is, shouldn't this be done
>>> in a separate patch?
>> The queue is just too small if you want to run long delays (30s) together
>> with high sample rates (190000 Hz). I can put it in a separate patch.
>>
>>>> +
>>>> +#define DEFAULT_BUFFER_MARGIN_MSEC 20
>>>>    
>>>>    #define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC)
>>>>    
>>>> @@ -80,11 +82,21 @@ struct userdata {
>>>>    
>>>>        int64_t recv_counter;
>>>>        int64_t send_counter;
>>>> +    uint32_t sink_adjust_counter;
>>>> +    uint32_t source_adjust_counter;
>>>>    
>>>> -    size_t skip;
>>>>        pa_usec_t latency;
>>>> +    pa_usec_t buffer_latency;
>>>> +    pa_usec_t initial_buffer_latency;
>>>> +    pa_usec_t configured_sink_latency;
>>>> +    pa_usec_t configured_source_latency;
>>> It would be nice to have comments about what each of these different
>>> latency variables mean.
>>>
>>> I'm trying to understand what buffer_latency is... I suppose it's used
>>> to configure the buffering so that the sink, source and buffer_latency
>>> together equal u->latency (i.e. the user-configured loopback latency).
>> Yes, the buffer latency is the part of the latency that is under direct
>> control of the module (mainly what is in the memblockq). It is the
>> parameter that is adjusted during the regulation process.
> Correct me if I'm wrong, but the parameter that is adjusted is the sink
> input rate. The regulation process doesn't touch the buffer_latency
> variable. The sink input rate affects the amount that is buffered in
> the memblockq, but the buffer_latency variable stays the same (except
> when moving the streams).

You are right, I was confusing the variable buffer_latency with "the
latency that is caused by the current number of samples in the buffer".
It's too long ago that I wrote this patch ...
buffer_latency is the minimum amount of audio that must be kept
in the buffer to avoid underruns.

>
>>> It's 1/4 of u->latency by default, so I guess you try to configure the
>>> sink and source latencies to be 3/8 of u->latency each? buffer_latency
>>> can't be less than 1.667 ms, however. (Where does that number come
>>> from?)
>>>
>>> Hmm... When configuring the sink and source latencies, you divide u-
>>>> latency by 3, which is consistent with the old code, so maybe it was a
>>> typo to divide u->latency by 4 when setting the default buffer_latency?
>>> I'll assume it was a typo, and you meant all three latency components
>>> to be one third of u->latency by default.
>> No, it is no typo. The intention was to set the initial buffer latency
>> to something
>> smaller than one third. The value is fixed up later.
> So does the initial value have any effect on anything? It seems very
> weird that buffer_latency defaults to 1/4 of the full latency and sink
> and source latencies are 1/3.

Taking a closer look today I cannot find any reason why I set it
to a quarter of the latency. It should be set to something like
"requested latency - expected source/sink latency - some msec"
because I use it later in the regulation if there are too many
underruns. This is currently only done correctly if you request a
latency that cannot be satisfied. I will fix it in the next version.

>
>>> buffer_latency can't be less than 3/4 of the sink or source latency
>>> (where does that 3/4 come from?), so if the sink or source latency
>>> exceeds that limit, buffer_latency is raised accordingly. That's with
>> Like many other values in this patch the 3/4 is the result of lots of
>> experiments
>> with the module. If it is less than 3/4 of sink or source latency you
>> will start
>> seeing underruns. You can go below that value in certain situations, it's a
>> safety margin.
> I think it should be possible to reason about the minimum safe value.
> We want to avoid underruns, but we also want to minimize the latency.
> I'm afraid experimental values will fail in cases that you haven't
> tested.
>
> I can't intuitively figure out what the minimum safe value is, but
> maybe some examples will show.
>
> Let's start with zero buffer_latency and 1 second sink and source
> latencies. Let's assume that both device buffers are full in the
> beginning, and that the sink requests for more data when its buffer is
> empty, and the source pushes out data when its buffer becomes full.
> Let's also assume that all operations are instantaneous. The sink and
> source run at the same rate.
>
> time = 0 s
>
> buffered:
> source: 1 s    memblockq: 0 s    sink: 1 s
>
> The source is full, so the audio moves to the memblockq.
>
> source: 0 s    memblockq: 1 s    sink: 1 s
>
> ----
>
> time = 1 s
>
> The source is again full, so it will push data to the memblockq. The
> sink is empty, so it will pull data from the memblockq. These two
> things can be serialized in two ways. Either the source pushes first:
>
> source: 0 s    memblockq: 2 s    sink: 0 s
> source: 0 s    memblockq: 1 s    sink: 1 s
>
> or the sink pulls first:
>
> source: 1 s    memblockq: 0 s    sink: 1 s
> source: 0 s    memblockq: 1 s    sink: 1 s
>
> In neither case underruns happened.
>
> We're back in the same situation as in the beginning, so in this
> example buffer_latency can safely be zero.
>
>
> What if we have different latencies in the sink and source? Let's set
> source latency to 2 s and sink latency to 3 s. Let's keep other
> parameters and assumptions the same. In the beginning both devices have
> full buffers again:
>
> time = 0 s
>
> source: 2 s    memblockq: 0 s    sink: 3 s
>
> The source is full, so it pushes.
>
> source: 0 s    memblockq: 2 s    sink: 3 s
>
> ----
>
> time = 2 s
>
> The source is full, so it pushes.
>
> source: 0 s    memblockq: 4 s    sink: 1 s
>
> ----
>
> time = 3 s
>
> The sink is empty, so it pulls.
>
> source: 1 s    memblockq: 1 s    sink: 3 s
>
> ----
>
> time = 4 s
>
> The source is full, so it pushes.
>
> source: 0 s    memblockq: 3 s    sink: 2 s
>
> ----
>
> time = 6 s
>
> The source is full and the sink is empty, so they push and pull.
>
> source: 0 s    memblockq: 5 s    sink: 0 s
> source: 0 s    memblockq: 2 s    sink: 3 s
>
> or
>
> source: 2 s    memblockq: 0 s    sink: 3 s
> source: 0 s    memblockq: 2 s    sink: 3 s
>
> We're now back to where we started. Again, no underruns. To me it seems
> that buffer_latency doesn't need to be big. Underruns happen when the
> memblockq doesn't have enough data to satisfy the pull from the sink.
> In these two examples the memblockq got empty in some cases, but those
> were cases where the sink had just been satisfied and the source was
> just about to push more data to the queue, so in some sense an underrun
> was not even close.
>
> In the cases where the memblockq got empty, a rate mismatch could have
> caused an underrun: if the source is just about to get full, and the
> sink consumes a bit too much data from the memblockq, a small underrun
> will occur. To mitigate that, a small margin is needed, which I suppose
> should be proportional to the rate error and the sink buffer size (or
> max_request to be more accurate). Does the source buffer size play any
> role? I'm not sure. Anyway, 3/4 of the sink buffer seems excessive,
> because the rate error won't be anywhere close to 75%.
>
> Then there are scheduling delays and other overhead that break the
> assumption of instantaneous operations. Those have very little effect
> with big latencies, but with very low latencies they become more
> important, and your 3/4 rule isn't necessary excessive.
>
> If buffer_latency only needs to cover the rate errors and cpu time,
> then its proportion of the whole latency should be big with low
> latencies and small with high latencies. Does this theory match with
> your experiments?

The point is, that the assumption that source_output and sink_input
rate are the same is not valid. As long as they are, you will not hit a
problem.
Once you are in a steady state you only have to care about jitter.
I cannot clearly remember how I derived that value, probably
experiments, but I still believe that 0.75 is a "good" estimate. If
you look at the 4 msec case, the buffer_latency is slightly lower
than 3/4 of the sink latency (1.667 versus 1.75 ms) but this is
also already slightly unstable.
In a more general case the resulting latency will be
1.75 * minimum_sink_latency, which I would consider small enough.

Regarding your concern that we want to keep latency down: The old
module loopback starts to throw errors in the log when you go down
to 7 msec latency on Intel HDA. The new module will run happily at
4 msec, so it is still an improvement.
With the controller I am currently developing together with Alexander
it might even be possible to go further down because it is better at
keeping the latency constant.

>>> dynamic latency support. If the sink or source doesn't support dynamic
>>> latency, buffer_latency is raised to default_fragment_size_msec + 20
>>> ms. I don't think it makes sense to use default_fragment_size_msec.
>>> That variable is not guaranteed to have any relation to the sink/source
>>> behaviour. Something derived from max_request for sinks would probably
>>> be appropriate. I'm not sure about sources, maybe the fixed_latency
>>> variable could be used.
>> Mh, that is also a value derived from experiments and a safety margin to
>> avoid underruns. I can say that for USB cards there is a clear connection
>> between default_fragment_size_msec and the necessary buffer_latency
>> to avoid underruns. Don't know if max_request is somehow connected
>> to default_fragment_size.
> Yes, max_request is connected to default_fragment_size. For the alsa
> sink, max_request is the same as the configured latency, which
> default_fragment_size affects.

To me it looks like max_request and fixed latency are derived from
default_fragment_size * default_fragments. The probability of underruns
only depends on default_fragment_size, the number of fragments
is irrelevant. The question is how large the chunks are that are transferred
between memblockq and source/sink (or between soundcard and pulse?).
So neither max_request nor fixed latency seem to be the right indicators.
I remember that it took me quite a while to figure out which values are safe
for batch and non-batch cards.
My tests have been done with USB, Intel HDA and bluetooth.

>>> I'm going to sleep now. I can continue reviewing this patch tomorrow,
>>> or I can do it when you send v2, if you split this into more manageable
>>> pieces (I'd prefer the latter). Which one do you think is better?
>>>
>> I'll split it in the next version, but I would appreciate if you can at
>> least check
>> the rest of the patch if you find something that is obviously wrong on first
>> inspection.
> Ok, I'll continue with this patch still. I said I'd do it today, but it
> will have to wait until tomorrow, sorry.
>
> -- 
> Tanu