[pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Wed Nov 25 10:49:00 PST 2015

On Wed, 2015-11-25 at 16:05 +0100, Georg Chini wrote:
> On 25.11.2015 09:00, Georg Chini wrote:
> > On 25.11.2015 01:14, Tanu Kaskinen wrote:
> > > On Tue, 2015-11-24 at 08:21 +0100, Georg Chini wrote:
> > > > On 24.11.2015 03:50, Tanu Kaskinen wrote:
> > > > > On Sun, 2015-11-22 at 13:21 +0100, Georg Chini wrote:
> > > > > > On 22.11.2015 00:27, Tanu Kaskinen wrote:
> > > > > > > On Sat, 2015-11-21 at 19:42 +0100, Georg Chini wrote: It's 
> > > > > > > probably not a bug. Things happen in the alsa IO thread loops in
> > > > > > > this order:
> > > > > > > 
> > > > > > > 1: fill buffer (sink) / empty buffer (source)
> > > > > > > 2: process events, such as "get latency" messages
> > > > > > > 3: sleep
> > > > > > > 4: goto 1
> > > > > > > 
> > > > > > > So when a "get latency" message is sent, alsa sinks refill the 
> > > > > > > buffer
> > > > > > > before processing the message, and sources push out any currently
> > > > > > > buffered audio. There are checks, though, that prevent this from
> > > > > > > happening if the sink buffer is already more than half full, or 
> > > > > > > if the
> > > > > > > source buffer is less than half full.
> > > > > > > 
> > > > > > > > > > All calculations assume that when I configure source and sink
> > > > > > > > > > latency to 1/3 of the requested latency each, I'll end up with
> > > > > > > > > > having about 1/3 of the latency in source and sink together.
> > > > > > > > > > I know this is strange but so far I have not seen a case where
> > > > > > > > > > this assumption fails.
> > > > > > > > > It doesn't sound strange to me, because if you randomly sample the
> > > > > > > > > buffer fill level of a sink or a source, on average it will be 
> > > > > > > > > 50% full
> > > > > > > > > (assuming that refills happen when the buffer gets empty, which is
> > > > > > > > > approximately true when using timer-based scheduling). On average,
> > > > > > > > > then, the sum of the sink and source latencies will be half of 
> > > > > > > > > the sum
> > > > > > > > > of the configured latencies.
> > > > > > > > Should then the reported latency not be half of the configured?
> > > > > > > > This is not the case, at least on the sink side.
> > > > > > > See above. In about 50% of cases the measured sink latency will be
> > > > > > > about 100% of the configured latency, and in the other 50% cases the
> > > > > > > average will be 75%. For sources, the measured latency will be 0% in
> > > > > > > half of the cases and on average 25% for the other half of the 
> > > > > > > cases.
> > > > > > > At lower latencies the "do nothing" check will trigger less often 
> > > > > > > due
> > > > > > > to a constant-sized safety margin, so the sink latencies will be 
> > > > > > > even
> > > > > > > more skewed towards 100% and source latencies towards 0%.
> > 
> > OK, understood. Strange that you are talking of 75% and 25%
> > average buffer fills. Doesn't that give a hint towards the connection
> > between sink latency and buffer_latency?
> > I believe I found something in the sink or alsa code back in February
> > which at least supported my choice of the 0.75, but I have to admit
> > that I can't find it anymore.
> Lets take the case I mentioned in my last mail. I have requested
> 20 ms for the sink/source latency and 5 ms for the memblockq.

What does it mean that you request 20 ms "sink/source latency"? There
is the sink latency and the source latency. Does 20 ms "sink/source
latency" mean that you want to give 10 ms to the sink and 10 ms to the
source? Or 20 ms to both?

> The
> 20 ms cannot be satisfied, I get 25 ms as sink/source latency when
> I try to configure it (USB device).

I don't understand how you get 25 ms. default_fragment_size was 5 ms
and default_fragments was 4, multiply those and you get 20 ms.

> For the loopback code it means that the target latency is not what
> I specified on the command line but the average sum of source and
> sink latency + buffer_latency.

The target latency should be "configured source latency +
buffer_latency + configured sink latency". The average latency of the
sink and source don't matter, because you need to be prepared for the
worst case scenario, in which the source buffer is full and the sink
wants to refill its buffer before the source pushes its buffered audio
to the memblockq.

When talking specifically about the alsa source in interrupt-based
scheduling mode, then it's highly likely that the source buffer fill
level will never go much beyond one fragment, which allows us to reduce
the target latency greatly, since the "configured source latency" part
of the target latency can be "fixed latency / fragments + some margin"
instead of the full fixed latency. That optimization should only be
done after checking that the source really is an alsa source, and that
it's running in the interrupt-driven mode. Or maybe we should add an
API to query sources about their buffering model in more detail:
module-loopback could then ask the source if it pushes out audio as
soon as certain buffer fill level is reached (a level that is
considerably lower than the full source buffer size). That would avoid
having backend-specific code in module-loopback.

> In the case it went for 27.5 ms, so
> the average reported source/sink latency was 22.5 ms This is consistent
> with what you say above, expected average would be 21.875. Our starting
> point is this one:
> 
> Loopback overall latency is 20,48 ms + 7,02 ms + 0,07 ms = 27,46 ms
> 
> at some point, we will have 25 ms in the sink and the buffer is down
> to 2.5 ms. If we now assume that audio is transferred in chunks of 5 ms
> (the default-fragment-size) it might be possible that at the point in
> time when the sink requests the next chunk, the source has not delivered
> it yet. So you will see an underrun because the 5ms request from the sink
> cannot be satisfied.
> So how much audio do we need in the buffer, so that this never happens?
> At least one default-fragment-size plus some safety margin - which is what
> I propose for batch devices.
> And indeed, if I try the same with buffer_latency=7ms instead of 5 ms, it
> works without underruns. 6 ms is not yet enough, I assume there is some
> time that the data spends in transit. I believe the reasoning will be 
> similar
> for timer-based scheduling.
> Please correct me, if my thinking is completely wrong.

-- 
Tanu