[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Fri Nov 20 07:18:20 PST 2015

On Fri, 2015-11-20 at 08:03 +0100, Georg Chini wrote:
> On 20.11.2015 01:03, Tanu Kaskinen wrote:
> > On Wed, 2015-02-25 at 19:43 +0100, Georg Chini wrote:
> > > as well as the way of reacting to sink input or source output moving.
> > > 
> > > The goal is to make sure that the initial latency of the system matches
> > > the configured one.
> > > 
> > > While at it, allow to set adjust_time to 0, which means "no adjustments
> > > at all".
> > > ---
> > >   src/modules/module-loopback.c | 329 +++++++++++++++++++++++++++++++++++-------
> > >   1 file changed, 275 insertions(+), 54 deletions(-)
> > > 
> > > diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
> > > index 79bd106..09f2f58 100644
> > > --- a/src/modules/module-loopback.c
> > > +++ b/src/modules/module-loopback.c
> > > @@ -59,7 +59,9 @@ PA_MODULE_USAGE(
> > >   
> > >   #define DEFAULT_LATENCY_MSEC 200
> > >   
> > > -#define MEMBLOCKQ_MAXLENGTH (1024*1024*16)
> > > +#define MEMBLOCKQ_MAXLENGTH (1024*1024*32)
> > Why is this change done? Whatever the reason is, shouldn't this be done
> > in a separate patch?
> 
> The queue is just too small if you want to run long delays (30s) together
> with high sample rates (190000 Hz). I can put it in a separate patch.
> 
> > 
> > > +
> > > +#define DEFAULT_BUFFER_MARGIN_MSEC 20
> > >   
> > >   #define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC)
> > >   
> > > @@ -80,11 +82,21 @@ struct userdata {
> > >   
> > >       int64_t recv_counter;
> > >       int64_t send_counter;
> > > +    uint32_t sink_adjust_counter;
> > > +    uint32_t source_adjust_counter;
> > >   
> > > -    size_t skip;
> > >       pa_usec_t latency;
> > > +    pa_usec_t buffer_latency;
> > > +    pa_usec_t initial_buffer_latency;
> > > +    pa_usec_t configured_sink_latency;
> > > +    pa_usec_t configured_source_latency;
> > It would be nice to have comments about what each of these different
> > latency variables mean.
> > 
> > I'm trying to understand what buffer_latency is... I suppose it's used
> > to configure the buffering so that the sink, source and buffer_latency
> > together equal u->latency (i.e. the user-configured loopback latency).
> 
> Yes, the buffer latency is the part of the latency that is under direct
> control of the module (mainly what is in the memblockq). It is the
> parameter that is adjusted during the regulation process.

Correct me if I'm wrong, but the parameter that is adjusted is the sink
input rate. The regulation process doesn't touch the buffer_latency
variable. The sink input rate affects the amount that is buffered in
the memblockq, but the buffer_latency variable stays the same (except
when moving the streams).

> > It's 1/4 of u->latency by default, so I guess you try to configure the
> > sink and source latencies to be 3/8 of u->latency each? buffer_latency
> > can't be less than 1.667 ms, however. (Where does that number come
> > from?)
> > 
> > Hmm... When configuring the sink and source latencies, you divide u-
> > > latency by 3, which is consistent with the old code, so maybe it was a
> > typo to divide u->latency by 4 when setting the default buffer_latency?
> > I'll assume it was a typo, and you meant all three latency components
> > to be one third of u->latency by default.
> 
> No, it is no typo. The intention was to set the initial buffer latency 
> to something
> smaller than one third. The value is fixed up later.

So does the initial value have any effect on anything? It seems very
weird that buffer_latency defaults to 1/4 of the full latency and sink
and source latencies are 1/3.

> > 
> > buffer_latency can't be less than 3/4 of the sink or source latency
> > (where does that 3/4 come from?), so if the sink or source latency
> > exceeds that limit, buffer_latency is raised accordingly. That's with
> 
> Like many other values in this patch the 3/4 is the result of lots of 
> experiments
> with the module. If it is less than 3/4 of sink or source latency you 
> will start
> seeing underruns. You can go below that value in certain situations, it's a
> safety margin.

I think it should be possible to reason about the minimum safe value.
We want to avoid underruns, but we also want to minimize the latency.
I'm afraid experimental values will fail in cases that you haven't
tested.

I can't intuitively figure out what the minimum safe value is, but
maybe some examples will show.

Let's start with zero buffer_latency and 1 second sink and source
latencies. Let's assume that both device buffers are full in the
beginning, and that the sink requests for more data when its buffer is
empty, and the source pushes out data when its buffer becomes full.
Let's also assume that all operations are instantaneous. The sink and
source run at the same rate.

time = 0 s

buffered:
source: 1 s    memblockq: 0 s    sink: 1 s

The source is full, so the audio moves to the memblockq.

source: 0 s    memblockq: 1 s    sink: 1 s

----

time = 1 s

The source is again full, so it will push data to the memblockq. The
sink is empty, so it will pull data from the memblockq. These two
things can be serialized in two ways. Either the source pushes first:

source: 0 s    memblockq: 2 s    sink: 0 s
source: 0 s    memblockq: 1 s    sink: 1 s

or the sink pulls first:

source: 1 s    memblockq: 0 s    sink: 1 s
source: 0 s    memblockq: 1 s    sink: 1 s

In neither case underruns happened.

We're back in the same situation as in the beginning, so in this
example buffer_latency can safely be zero.

What if we have different latencies in the sink and source? Let's set
source latency to 2 s and sink latency to 3 s. Let's keep other
parameters and assumptions the same. In the beginning both devices have
full buffers again:

time = 0 s

source: 2 s    memblockq: 0 s    sink: 3 s

The source is full, so it pushes.

source: 0 s    memblockq: 2 s    sink: 3 s

----

time = 2 s

The source is full, so it pushes.

source: 0 s    memblockq: 4 s    sink: 1 s

----

time = 3 s

The sink is empty, so it pulls.

source: 1 s    memblockq: 1 s    sink: 3 s

----

time = 4 s

The source is full, so it pushes.

source: 0 s    memblockq: 3 s    sink: 2 s

----

time = 6 s

The source is full and the sink is empty, so they push and pull.

source: 0 s    memblockq: 5 s    sink: 0 s
source: 0 s    memblockq: 2 s    sink: 3 s

or

source: 2 s    memblockq: 0 s    sink: 3 s
source: 0 s    memblockq: 2 s    sink: 3 s

We're now back to where we started. Again, no underruns. To me it seems
that buffer_latency doesn't need to be big. Underruns happen when the
memblockq doesn't have enough data to satisfy the pull from the sink.
In these two examples the memblockq got empty in some cases, but those
were cases where the sink had just been satisfied and the source was
just about to push more data to the queue, so in some sense an underrun
was not even close.

In the cases where the memblockq got empty, a rate mismatch could have
caused an underrun: if the source is just about to get full, and the
sink consumes a bit too much data from the memblockq, a small underrun
will occur. To mitigate that, a small margin is needed, which I suppose
should be proportional to the rate error and the sink buffer size (or
max_request to be more accurate). Does the source buffer size play any
role? I'm not sure. Anyway, 3/4 of the sink buffer seems excessive,
because the rate error won't be anywhere close to 75%.

Then there are scheduling delays and other overhead that break the
assumption of instantaneous operations. Those have very little effect
with big latencies, but with very low latencies they become more
important, and your 3/4 rule isn't necessary excessive.

If buffer_latency only needs to cover the rate errors and cpu time,
then its proportion of the whole latency should be big with low
latencies and small with high latencies. Does this theory match with
your experiments?

> > dynamic latency support. If the sink or source doesn't support dynamic
> > latency, buffer_latency is raised to default_fragment_size_msec + 20
> > ms. I don't think it makes sense to use default_fragment_size_msec.
> > That variable is not guaranteed to have any relation to the sink/source
> > behaviour. Something derived from max_request for sinks would probably
> > be appropriate. I'm not sure about sources, maybe the fixed_latency
> > variable could be used.
> 
> Mh, that is also a value derived from experiments and a safety margin to
> avoid underruns. I can say that for USB cards there is a clear connection
> between default_fragment_size_msec and the necessary buffer_latency
> to avoid underruns. Don't know if max_request is somehow connected
> to default_fragment_size.

Yes, max_request is connected to default_fragment_size. For the alsa
sink, max_request is the same as the configured latency, which
default_fragment_size affects.

> > I'm going to sleep now. I can continue reviewing this patch tomorrow,
> > or I can do it when you send v2, if you split this into more manageable
> > pieces (I'd prefer the latter). Which one do you think is better?
> > 
> I'll split it in the next version, but I would appreciate if you can at 
> least check
> the rest of the patch if you find something that is obviously wrong on first
> inspection.

Ok, I'll continue with this patch still. I said I'd do it today, but it
will have to wait until tomorrow, sorry.

-- 
Tanu