[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Sat Nov 21 15:27:37 PST 2015

On Sat, 2015-11-21 at 19:42 +0100, Georg Chini wrote:
> On 20.11.2015 16:18, Tanu Kaskinen wrote:
> > On Fri, 2015-11-20 at 08:03 +0100, Georg Chini wrote:
> > > On 20.11.2015 01:03, Tanu Kaskinen wrote:
> > > > On Wed, 2015-02-25 at 19:43 +0100, Georg Chini wrote:
> > > > > 
> > > > > +
> > > > > +#define DEFAULT_BUFFER_MARGIN_MSEC 20
> > > > >    
> > > > >    #define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC)
> > > > >    
> > > > > @@ -80,11 +82,21 @@ struct userdata {
> > > > >    
> > > > >        int64_t recv_counter;
> > > > >        int64_t send_counter;
> > > > > +    uint32_t sink_adjust_counter;
> > > > > +    uint32_t source_adjust_counter;
> > > > >    
> > > > > -    size_t skip;
> > > > >        pa_usec_t latency;
> > > > > +    pa_usec_t buffer_latency;
> > > > > +    pa_usec_t initial_buffer_latency;
> > > > > +    pa_usec_t configured_sink_latency;
> > > > > +    pa_usec_t configured_source_latency;
> > > > It would be nice to have comments about what each of these different
> > > > latency variables mean.
> > > > 
> > > > I'm trying to understand what buffer_latency is... I suppose it's used
> > > > to configure the buffering so that the sink, source and buffer_latency
> > > > together equal u->latency (i.e. the user-configured loopback latency).
> > > Yes, the buffer latency is the part of the latency that is under direct
> > > control of the module (mainly what is in the memblockq). It is the
> > > parameter that is adjusted during the regulation process.
> > Correct me if I'm wrong, but the parameter that is adjusted is the sink
> > input rate. The regulation process doesn't touch the buffer_latency
> > variable. The sink input rate affects the amount that is buffered in
> > the memblockq, but the buffer_latency variable stays the same (except
> > when moving the streams).
> 
> You are right, I was confusing the variable buffer_latency with "the
> latency that is caused by the current number of samples in the buffer".
> It's too long ago that I wrote this patch ...
> buffer_latency is the minimum amount of audio that must be kept
> in the buffer to avoid underruns.
> 
> > 
> > > > It's 1/4 of u->latency by default, so I guess you try to configure the
> > > > sink and source latencies to be 3/8 of u->latency each? buffer_latency
> > > > can't be less than 1.667 ms, however. (Where does that number come
> > > > from?)
> > > > 
> > > > Hmm... When configuring the sink and source latencies, you divide u-
> > > > > latency by 3, which is consistent with the old code, so maybe it was a
> > > > typo to divide u->latency by 4 when setting the default buffer_latency?
> > > > I'll assume it was a typo, and you meant all three latency components
> > > > to be one third of u->latency by default.
> > > No, it is no typo. The intention was to set the initial buffer latency
> > > to something
> > > smaller than one third. The value is fixed up later.
> > So does the initial value have any effect on anything? It seems very
> > weird that buffer_latency defaults to 1/4 of the full latency and sink
> > and source latencies are 1/3.
> 
> Taking a closer look today I cannot find any reason why I set it
> to a quarter of the latency. It should be set to something like
> "requested latency - expected source/sink latency - some msec"
> because I use it later in the regulation if there are too many
> underruns. This is currently only done correctly if you request a
> latency that cannot be satisfied. I will fix it in the next version.
> 
> > > > buffer_latency can't be less than 3/4 of the sink or source latency
> > > > (where does that 3/4 come from?), so if the sink or source latency
> > > > exceeds that limit, buffer_latency is raised accordingly. That's with
> > > Like many other values in this patch the 3/4 is the result of lots of
> > > experiments
> > > with the module. If it is less than 3/4 of sink or source latency you
> > > will start
> > > seeing underruns. You can go below that value in certain situations, it's a
> > > safety margin.
> > I think it should be possible to reason about the minimum safe value.
> > We want to avoid underruns, but we also want to minimize the latency.
> > I'm afraid experimental values will fail in cases that you haven't
> > tested.
> > 
> > I can't intuitively figure out what the minimum safe value is, but
> > maybe some examples will show.
> > 
> > Let's start with zero buffer_latency and 1 second sink and source
> > latencies. Let's assume that both device buffers are full in the
> > beginning, and that the sink requests for more data when its buffer is
> > empty, and the source pushes out data when its buffer becomes full.
> > Let's also assume that all operations are instantaneous. The sink and
> > source run at the same rate.
> > 
> > time = 0 s
> > 
> > buffered:
> > source: 1 s    memblockq: 0 s    sink: 1 s
> > 
> > The source is full, so the audio moves to the memblockq.
> > 
> > source: 0 s    memblockq: 1 s    sink: 1 s
> > 
> > ----
> > 
> > time = 1 s
> > 
> > The source is again full, so it will push data to the memblockq. The
> > sink is empty, so it will pull data from the memblockq. These two
> > things can be serialized in two ways. Either the source pushes first:
> > 
> > source: 0 s    memblockq: 2 s    sink: 0 s
> > source: 0 s    memblockq: 1 s    sink: 1 s
> > 
> > or the sink pulls first:
> > 
> > source: 1 s    memblockq: 0 s    sink: 1 s
> > source: 0 s    memblockq: 1 s    sink: 1 s
> > 
> > In neither case underruns happened.
> > 
> > We're back in the same situation as in the beginning, so in this
> > example buffer_latency can safely be zero.
> > 
> > 
> > What if we have different latencies in the sink and source? Let's set
> > source latency to 2 s and sink latency to 3 s. Let's keep other
> > parameters and assumptions the same. In the beginning both devices have
> > full buffers again:
> > 
> > time = 0 s
> > 
> > source: 2 s    memblockq: 0 s    sink: 3 s
> > 
> > The source is full, so it pushes.
> > 
> > source: 0 s    memblockq: 2 s    sink: 3 s
> > 
> > ----
> > 
> > time = 2 s
> > 
> > The source is full, so it pushes.
> > 
> > source: 0 s    memblockq: 4 s    sink: 1 s
> > 
> > ----
> > 
> > time = 3 s
> > 
> > The sink is empty, so it pulls.
> > 
> > source: 1 s    memblockq: 1 s    sink: 3 s
> > 
> > ----
> > 
> > time = 4 s
> > 
> > The source is full, so it pushes.
> > 
> > source: 0 s    memblockq: 3 s    sink: 2 s
> > 
> > ----
> > 
> > time = 6 s
> > 
> > The source is full and the sink is empty, so they push and pull.
> > 
> > source: 0 s    memblockq: 5 s    sink: 0 s
> > source: 0 s    memblockq: 2 s    sink: 3 s
> > 
> > or
> > 
> > source: 2 s    memblockq: 0 s    sink: 3 s
> > source: 0 s    memblockq: 2 s    sink: 3 s
> > 
> > We're now back to where we started. Again, no underruns. To me it seems
> > that buffer_latency doesn't need to be big. Underruns happen when the
> > memblockq doesn't have enough data to satisfy the pull from the sink.
> > In these two examples the memblockq got empty in some cases, but those
> > were cases where the sink had just been satisfied and the source was
> > just about to push more data to the queue, so in some sense an underrun
> > was not even close.
> > 
> > In the cases where the memblockq got empty, a rate mismatch could have
> > caused an underrun: if the source is just about to get full, and the
> > sink consumes a bit too much data from the memblockq, a small underrun
> > will occur. To mitigate that, a small margin is needed, which I suppose
> > should be proportional to the rate error and the sink buffer size (or
> > max_request to be more accurate). Does the source buffer size play any
> > role? I'm not sure. Anyway, 3/4 of the sink buffer seems excessive,
> > because the rate error won't be anywhere close to 75%.
> > 
> > Then there are scheduling delays and other overhead that break the
> > assumption of instantaneous operations. Those have very little effect
> > with big latencies, but with very low latencies they become more
> > important, and your 3/4 rule isn't necessary excessive.
> > 
> > If buffer_latency only needs to cover the rate errors and cpu time,
> > then its proportion of the whole latency should be big with low
> > latencies and small with high latencies. Does this theory match with
> > your experiments?
> 
> The point is, that the assumption that source_output and sink_input
> rate are the same is not valid. As long as they are, you will not hit a
> problem.

I did cover also the case where there is clock drift and the rate
hasn't yet been stabilized. (It's the paragraph starting with "In the
cases where...") I argued that the clock drift won't cause big enough
data shortage to warrant a 75% safety margin relative to the sink
latency.

I now believe that my initial intuition about what the safety margin
should be to cover for rate errors was wrong, though. I now think that
if the sink input is consuming data too fast (i.e. the configured rate
is too low), the error margin has to be big enough to cover for all
excess samples consumed before the rate controller slows down the sink
input data consumption rate to be at or below the rate at which the
source produces data. For example, if it takes one adjustment cycle to
change a too-fast-consuming sink input to not-too-fast-consuming, the
error margin needs to be "rate_error_in_hz * adjust_time" samples. The
sink and source latencies are irrelevant. If it takes more than one
adjustment cycle, it's more complicated, but an upper bound for the
minimum safety margin is "max_expected_rate_error_in_hz *
number_of_cycles * adjust_time" samples.

> Once you are in a steady state you only have to care about jitter.
> I cannot clearly remember how I derived that value, probably
> experiments, but I still believe that 0.75 is a "good" estimate. If
> you look at the 4 msec case, the buffer_latency is slightly lower
> than 3/4 of the sink latency (1.667 versus 1.75 ms) but this is
> also already slightly unstable.
> In a more general case the resulting latency will be
> 1.75 * minimum_sink_latency, which I would consider small enough.

I don't understand where that "1.75 * minimum_sink_latency" comes from.
I'd understand if you said "0.75 * maximum_sink_latency", because
that's what the code seems to do.

Anyway, any formula involving the sink or source latency seems bogus to
me. adjust_time and the rate error (or an estimate of the maximum rate
error, if the real error isn't known) are what matter. Plus of course
some margin for cpu overhead/jitter, which should be constant (I
think). The jitter margin might contribute more than the margin for
covering for rate errors.

> Regarding your concern that we want to keep latency down: The old
> module loopback starts to throw errors in the log when you go down
> to 7 msec latency on Intel HDA. The new module will run happily at
> 4 msec, so it is still an improvement.
> With the controller I am currently developing together with Alexander
> it might even be possible to go further down because it is better at
> keeping the latency constant.
> 
> > > > dynamic latency support. If the sink or source doesn't support dynamic
> > > > latency, buffer_latency is raised to default_fragment_size_msec + 20
> > > > ms. I don't think it makes sense to use default_fragment_size_msec.
> > > > That variable is not guaranteed to have any relation to the sink/source
> > > > behaviour. Something derived from max_request for sinks would probably
> > > > be appropriate. I'm not sure about sources, maybe the fixed_latency
> > > > variable could be used.
> > > Mh, that is also a value derived from experiments and a safety margin to
> > > avoid underruns. I can say that for USB cards there is a clear connection
> > > between default_fragment_size_msec and the necessary buffer_latency
> > > to avoid underruns. Don't know if max_request is somehow connected
> > > to default_fragment_size.
> > Yes, max_request is connected to default_fragment_size. For the alsa
> > sink, max_request is the same as the configured latency, which
> > default_fragment_size affects.
> 
> To me it looks like max_request and fixed latency are derived from
> default_fragment_size * default_fragments. The probability of underruns
> only depends on default_fragment_size, the number of fragments
> is irrelevant. The question is how large the chunks are that are transferred
> between memblockq and source/sink (or between soundcard and pulse?).
> So neither max_request nor fixed latency seem to be the right indicators.
> I remember that it took me quite a while to figure out which values are safe
> for batch and non-batch cards.
> My tests have been done with USB, Intel HDA and bluetooth.

It may be that default_fragments has very little effect in practice,
because we get an interrupt when the first fragment has been consumed.
Increasing default_fragments only protects against underruns in case of
big scheduling delays. But the thing is that you need to code against
the backend-independent interface, not against the alsa backend
implementation. In theory an alsa sink running without dynamic latency
can request up to default_fragment_size * default_fragments amount of
data. max_request tells what the theoretical upper limit for one
request is.

Also, the bluetooth sink sets a fixed latency, and that latency has
nothing to do with default_fragment_size.

I guess none of this matters, if I'm right in saying that the sink and
source latencies are irrelevant for calculating the required safety
margin. Hmm... now it occurred to me that the sink latency has some
(smallish?) relevance after all, because when the sink input rate
changes, the sink will have some data buffered that is resampled using
the old rate. If the old rate was too low, the buffered data will be
played too fast. Therefore, the rate error safety margin formula that I
presented earlier has to be modified to include the sink latency. So
the rate error safety margin should be at least
"max_expected_rate_error_in_hz * (number_of_cycles * adjust_time +
sink_latency)" samples.

-- 
Tanu